Discussion:
Convert those dastardly curly quotes to straight quotes on Windows?
(too old to reply)
harry newton
2017-10-07 21:38:03 UTC
Permalink
How can we convert those dastardly curly quotes to straight quotes on Windows?
<Loading Image...>

I like to save into TEXT files on Windows technical information cut and
pasted from disjoint news articles where the unprintable curly quotes drive
me nuts!

Here is a screenshot of a sample cut and paste:
<http://i67.tinypic.com/2h5mjbr.jpg>

I tried cutting from the web and pasting into MS Word and then cutting from
MS Word and pasting into the text file - but the dastardly curly quotes
were still there.

I tried using Google Gmail, pasting into a composition window and then
hitting the "Tx" format text button, and even changing the font to some
other font, but the dastardly curly quotes were still there.

Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?

Here's just one sample but the web is filled with dastardly curly quotes!
<http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating>
Auric__
2017-10-07 22:57:53 UTC
Permalink
Post by harry newton
How can we convert those dastardly curly quotes to straight quotes on Windows?
<http://i67.tinypic.com/2h5mjbr.jpg>
I like to save into TEXT files on Windows technical information cut and
pasted from disjoint news articles where the unprintable curly quotes
drive me nuts!
<http://i67.tinypic.com/2h5mjbr.jpg>
I tried cutting from the web and pasting into MS Word and then cutting
from MS Word and pasting into the text file - but the dastardly curly
quotes were still there.
I tried using Google Gmail, pasting into a composition window and then
hitting the "Tx" format text button, and even changing the font to some
other font, but the dastardly curly quotes were still there.
Since almost every technical web site uses the dastardly curly quotes,
how can I just get *rid* of them using a Windows method so that I can
have a text file that contains normal quotes?
Here's just one sample but the web is filled with dastardly curly quotes!
<http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-ap
ple-investigating>
Copy the text to your text editor (or if already in a text file, open the
file). Select a "curly quote", copy it. Replace all, paste the copied curly
into the "find this" box, and then type a regular quote in the "replace it
with this" box, replace all. Repeat for the "close curly quotes".
--
You looking at my hog? Don't look at my hog... or my motorcycle.
harry newton
2017-10-07 23:23:34 UTC
Permalink
Post by Auric__
Copy the text to your text editor (or if already in a text file, open the
file). Select a "curly quote", copy it. Replace all, paste the copied curly
into the "find this" box, and then type a regular quote in the "replace it
with this" box, replace all. Repeat for the "close curly quotes".
I should have mentioned that the curly quotes are just the tip of the
iceberg, and even they have "opening" and "closing" curly quotes, and even
then, they have "single" and "double" curly quotes ... and there's lots
more of this "curly-quote" stuff, so cutting and pasting isn't even close
to a solution.

I'm looking for a program that just does away with all non-standard
"us-ascii" characters that aren't on a typical American US English
keyboard.

The use model requested is:
a. Copy dastardly bastardized text (which is most web pages)
b. Paste into this Jesus program (which absolves all curly-quote sins)
c. Then cut and paste out of that Jesus program into the text file or
Usenet post.

For more on those sinful dastard non-standard-character abominations, see:
<https://practicaltypography.com/straight-and-curly-quotes.html>
<https://www.theatlantic.com/technology/archive/2016/12/quotation-mark-wars/511766/>
etc.

Maybe this will work, but it's ponderous:
<https://support.office.com/en-us/article/Change-curly-quotes-to-straight-quotes-and-vice-versa-017963A0-BC5F-486B-9C9D-0EC511A8FB8F
Mayayana
2017-10-08 03:16:53 UTC
Permalink
"harry newton" <***@is.invalid> wrote

| > Copy the text to your text editor (or if already in a text file, open
the
| > file). Select a "curly quote", copy it. Replace all, paste the copied
curly
| > into the "find this" box, and then type a regular quote in the "replace
it
| > with this" box, replace all. Repeat for the "close curly quotes".
|
| I should have mentioned that the curly quotes are just the tip of the
| iceberg, and even they have "opening" and "closing" curly quotes, and even
| then, they have "single" and "double" curly quotes ... and there's lots
| more of this "curly-quote" stuff, so cutting and pasting isn't even close
| to a solution.
|
| I'm looking for a program that just does away with all non-standard
| "us-ascii" characters that aren't on a typical American US English
| keyboard.
|
I can't see your tinypic links. Apparently they
require script. But I know what you mean. That also
drives me crazy. It's an entirelty unnecessary
complication.
Auric's solution is the most realistic. I know there
are different characters, but usually not many.
Two kinds of curly quotes and unicode white
space are the most common. There's no way to
make a generic program to treat all possibilities
because you're substituting ANSI characters for
UTF-8. The possibilities go into the thousands.
If you just save as ANSI and then replace anything
funky, it's not too bad. Otherwise, you can just save
the text file as UTF-8.

I agree with you. There's no reason to use curly
quotes. Using the ASCII versions means not needing
to use UTF-8 encoding. If UTF-8 were really necessary
it would be different, but most of the world lives by
ANSI. And webpages in European languages work just
fine with ANSI. Microsoft is one of the worst for that
problem. They write pages intended for an English-speaking
audience, in English, then use just a handful of unnecessary
UTF-8 characters that break the ANSI continuity. It makes
no sense.
pyotr filipivich
2017-10-08 04:15:58 UTC
Permalink
Post by Mayayana
I agree with you. There's no reason to use curly
quotes. Using the ASCII versions means not needing
to use UTF-8 encoding. If UTF-8 were really necessary
it would be different, but most of the world lives by
ANSI. And webpages in European languages work just
fine with ANSI. Microsoft is one of the worst for that
problem. They write pages intended for an English-speaking
audience, in English, then use just a handful of unnecessary
UTF-8 characters that break the ANSI continuity. It makes
no sense.
IMOSHO, it makes no sense, but then it is Microsoft. Which often
seem to have a lot of "I'm sure it makes sense - not to me, but to
someone" elements.
--
pyotr filipivich
Next month's Panel: Graft - Boon or blessing?
Mayayana
2017-10-08 14:41:02 UTC
Permalink
"pyotr filipivich" <***@mindspring.com> wrote

|>Microsoft is one of the worst for that
| >problem. They write pages intended for an English-speaking
| >audience, in English, then use just a handful of unnecessary
| >UTF-8 characters that break the ANSI continuity. It makes
| >no sense.
|
| IMOSHO, it makes no sense, but then it is Microsoft. Which often
| seem to have a lot of "I'm sure it makes sense - not to me, but to
| someone" elements.
|

That's a generous view. I don't see a problem
with switching to UTF-8, but what MS are doing is
to deliberately and unnecessarily break ASCII
compatibility without any need to do so, by replacing
quotes and spaces with unicode characters in UTF-8.
It seems to be a kind of political correctness attitude.
Nearly all English pages can easily be both ASCII and
UTF-8.

I wonder how journalists type those quotes. Maybe
they have a software program that does the conversion?
Wolf K
2017-10-08 15:24:21 UTC
Permalink
Post by Mayayana
|>Microsoft is one of the worst for that
| >problem. They write pages intended for an English-speaking
| >audience, in English, then use just a handful of unnecessary
| >UTF-8 characters that break the ANSI continuity. It makes
| >no sense.
|
| IMOSHO, it makes no sense, but then it is Microsoft. Which often
| seem to have a lot of "I'm sure it makes sense - not to me, but to
| someone" elements.
|
That's a generous view. I don't see a problem
with switching to UTF-8, but what MS are doing is
to deliberately and unnecessarily break ASCII
compatibility without any need to do so, by replacing
quotes and spaces with unicode characters in UTF-8.
It seems to be a kind of political correctness attitude.
Nearly all English pages can easily be both ASCII and
UTF-8.
I wonder how journalists type those quotes. Maybe
they have a software program that does the conversion?
ANSI = ASCII plus 128 to 255. Most ANSI codes have Unicode counterparts.

See
http://ascii-table.com/ansi-codes.php
--
Wolf K
kirkwood40.blogspot.com
"Wanted. Schrödinger’s Cat. Dead and Alive."
Mayayana
2017-10-08 16:34:40 UTC
Permalink
"Wolf K" <***@sympatico.ca> wrote

| ANSI = ASCII plus 128 to 255. Most ANSI codes have Unicode counterparts.
|
| See
| http://ascii-table.com/ansi-codes.php
|

Answered in another post. Note that page you
linked explains that it's showing codepage 1252.
The standard Windows English codepage. That
only holds if your local language is set to English.

The whole thing gets very complicated. I tried
to clarify it in my other post. In brief, ASCII is the
same for everyone and deals with representing
characters with single byte values from 0 -127.
ANSI adds the rest of the byte - 128-255. But the
character represented depends on the local codepage.
Russian ANSI text is not the same as English ANSI
text and Turkish will be different yet again.

Unicode uses 2 bytes per character. A fundamentally
different way to encode characters. It allows for
characters in Russian, Turkish, etc to all have their
own unique numeric values.

UTF-8, which is how most webpages are now encoded,
is a one-byte encoding that uses 1-4 bytes to represent
all of the unicode set. (One byte in this case means that
each byte is read as a signifier while in normal unicode
2 bytes at a time are read as a signifier.)

So... "a" is 97 in ASCII. It's 97 in ANSI. It's 97 in UTF-8.
All one byte. In unicode it's byte 0 followed by byte 97.
But curly quotes are not in ASCII. In the English ANSI
codepage they're 147 and 148. But not in other codepages.
In unicode they're 8220 and 8221. 8220 would be represented
by byte values 32 and 28. 2 bytes for the single character,
read as a single, 2-byte numeric value. In UTF-8 encoding
the left curly quote is rendered with bytes 226-128-156
(hex E2 80 9C). It's not a 3-byte number. It's a pattern of 3
1-byte numbers.

If you download this webpage in a hex editor and look at
the bytes you can see:

https://www.dwheeler.com/essays/quotes-test-utf-8.html

The page also shows how it's possible to use non-standard
characters in standard ASCII HTML by using HTML encoding:
&#8220; will render as a left curly quote, regardless of
language.
Peter Moylan
2017-10-09 02:22:49 UTC
Permalink
Post by Wolf K
Post by Mayayana
|>Microsoft is one of the worst for that
| >problem. They write pages intended for an English-speaking
| >audience, in English, then use just a handful of unnecessary
| >UTF-8 characters that break the ANSI continuity. It makes
| >no sense.
|
| IMOSHO, it makes no sense, but then it is Microsoft. Which often
| seem to have a lot of "I'm sure it makes sense - not to me, but to
| someone" elements.
|
That's a generous view. I don't see a problem
with switching to UTF-8, but what MS are doing is
to deliberately and unnecessarily break ASCII
compatibility without any need to do so, by replacing
quotes and spaces with unicode characters in UTF-8.
It seems to be a kind of political correctness attitude.
Nearly all English pages can easily be both ASCII and
UTF-8.
I wonder how journalists type those quotes. Maybe
they have a software program that does the conversion?
ANSI = ASCII plus 128 to 255. Most ANSI codes have Unicode counterparts.
See
http://ascii-table.com/ansi-codes.php
The following quote from Wikipedia is accurate, to the best of my knowledge.

<quote>
The phrase ANSI character set has no official meaning and has been used
to refer to the following, among other things:

Windows code pages, a collection of 8-bit character sets compatible
with ASCII but incompatible with each other, especially those code pages
that are partly compatible with ISO-8859, most commonly Windows Latin 1
ASCII, a 7-bit character set. (Very rarely.)
ANSEL, the American National Standard for Extended Latin Alphabet
Coded Character Set. (Very rarely.)
ISO-8859, a collection of 8-bit character sets compatible with
ASCII. (Very rarely.)
</quote>

In my experience the first is the most common meaning. It is the
character set that some Microsoft software calls ISO-8859-1, but which
is not compatible with the real ISO-8859-1, nor in fact with any ISO
character set. As such it is useful only for transmitting information
between Windows users, and apart from that use it has serious
portability issues. Not recommended for general use, mostly because of
the conflict with ISO-8859-1.

One might argue that Microsoft web pages are intended to be read only by
Windows users, but that's not entirely true. For example, I have had to
read them on my OS/2 computer when my Windows 10 computer locked up.
--
Peter Moylan http://www.pmoylan.org
Newcastle, NSW, Australia
Paul
2017-10-09 03:41:37 UTC
Permalink
Post by Peter Moylan
when my Windows 10 computer locked up.
Ouch.

I had that happen, with Imagemagick and the OpenMP library.

I re-ran the test case a couple days ago on the Win10 Insider
edition, and it was fixed! Only took a year.

So I take back what I said about them never fixing stuff.
Whether it was by accident or by design, it no longer
freezes for that test case.

The test case involves opening a 10GB .psb file with
Imagemagick and trying to display it. It's supposed to
say "out of memory" and it actually survives now to
report that. The image is a panorama with more than
4 billion pixels. Previously, it railed all the
cores on the CPU during one stage of the operation,
and there was no way to kill it (nothing on the
computer worked).

*******

In the Feedback Hub, I asked them for a Task Manager
that actually worked. I still haven't received a
response.

Paul
Mayayana
2017-10-09 04:42:04 UTC
Permalink
"Peter Moylan" <***@pmoylan.org.invalid> wrote

| Windows code pages, a collection of 8-bit character sets compatible
| with ASCII but incompatible with each other, especially those code pages
| that are partly compatible with ISO-8859
....
|
| In my experience the first is the most common meaning. It is the
| character set that some Microsoft software calls ISO-8859-1, but which
| is not compatible with the real ISO-8859-1, nor in fact with any ISO
| character set. As such it is useful only for transmitting information
| between Windows users, and apart from that use it has serious
| portability issues.

Read my posts. ANSI is generally used to refer to
8-bit encoding using codepages. It's not just the
Windows English codepage. Look up code page.
Your system uses a code page to interpret characters
128+ in 8-bit/1 byte encoding. It started with ASCII,
which was all anyone needed. English characters.
When computing spread there was a need to accomodate
different languages. So 7-bit ASCII was adapted to
8-bit encoding, providing an extra 128 characters. Those
characters were then assigned according to code pages.
Your system uses a code page in accord with the language
you're using. That's what people call ANSI. 8-bit encoding
using a local codepage to define characters 128+.

It can sometimes get a bit sticky because there are
useful characters in the 128+ range of the English codepage.
Like curly quotes. And it used to be that those could be
used without worry because there was little mixing of languages
in computing. But as a result most people don't realize that
someone using another language won't see curly quotes
because their codepage will define those character values
differently. So it's not about compatibility between Windows
users.
(There's sometimes a similar problem with fonts. Not all
fonts display the same characters. Someone on Windows
might use a Wing Dings font to display astrological
symbols, for instance. But another windows user without
that font, or a Mac user, will probably see the character as
it's dispayed in Times New Roman or Helvetica, respectively.
Those fonts don't have astrological symbols. So Gemini,
say, will probably render as "d", "M", or some such.
People think it's just a character. But it's not that
standardized.


According to the Wikipedia page, the only encoding
officially designated by American National Standards Institute
(ANSI) was ASCII, but in general usage ANSI refers to
8-bit/1-byte encoding using codepages. That's a
very specific, defined kind of encoding. They can
say it's a "misnomer" to call it ANSI, but that's generally
what it's called. It's like calling a tissue a kleenex. The
Kleenex company might want to split hairs, but I don't
know anyone who says "facial tissue". And it helps to have
the popular ANSI definition because the whole range of
codepages, with 8-bit encoding, constitutes a single system
of encoding that's not the same as ASCII or unicode.

| One might argue that Microsoft web pages are intended to be read only by
| Windows users, but that's not entirely true.

Not at all. Plenty of people on Linux or Macs might
want to access Windows docs or may also have
Windows computers. Microsoft are not publishing their
pages as ANSI 1252.

*That's the problem that we've been talking about
in this thread.*

They're publishing them as UTF-8
to be international, and the rare non-ASCII characters
used cause problems for people who want to save the
text as ANSI, which is still the default for most purposes.

These days it's easy to save as UTF-8. Notepad
provides that option. (I think GVIM doesn't only because
it's meant for use as a code editor, not a text editor.)
Personally, though, I prefer to save as ANSI. (And
yes, Notepad calls it ANSI, too.) Despite that unicode
has become necessary, as an English speaker it's
almost never necessary for me, so I like to keep my
files all the same and not have to worry about which
is encoded how. And there's always a chance that
some software won't recognize UTF-8.
pyotr filipivich
2017-10-08 15:48:20 UTC
Permalink
Post by Mayayana
|>Microsoft is one of the worst for that
| >problem. They write pages intended for an English-speaking
| >audience, in English, then use just a handful of unnecessary
| >UTF-8 characters that break the ANSI continuity. It makes
| >no sense.
|
| IMOSHO, it makes no sense, but then it is Microsoft. Which often
| seem to have a lot of "I'm sure it makes sense - not to me, but to
| someone" elements.
|
That's a generous view. I don't see a problem
with switching to UTF-8, but what MS are doing is
to deliberately and unnecessarily break ASCII
compatibility without any need to do so,
MS has a policy of adopting something, modifying it out of
recognition, then insisting nothing else be used.
Post by Mayayana
by replacing
quotes and spaces with unicode characters in UTF-8.
It seems to be a kind of political correctness attitude.
Nearly all English pages can easily be both ASCII and
UTF-8.
I wonder how journalists type those quotes. Maybe
they have a software program that does the conversion?
May be the word processing package they're using, which
automagically typesets on the fly.
--
pyotr filipivich
Next month's Panel: Graft - Boon or blessing?
J. P. Gilliver (John)
2017-10-08 15:58:57 UTC
Permalink
[]
Post by pyotr filipivich
Post by Mayayana
I wonder how journalists type those quotes. Maybe
they have a software program that does the conversion?
May be the word processing package they're using, which
automagically typesets on the fly.
Word does that (by default - you can turn it off); if I type "Fred", it
will convert the quotes into the 66 and 99 form (I think it calls them
"smart quotes"). [I think it does the same with single quotes, 'Fred'.]
You can stop it doing it either by turning off the setting, or on a
one-off basis by doing an Undo (Ctrl-Z) immediately after typing the ".

I wouldn't be surprised if some web-page editing software behaves
similarly.
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
harry newton
2017-10-09 04:06:35 UTC
Permalink
Post by J. P. Gilliver (John)
Word does that (by default - you can turn it off); if I type "Fred", it
will convert the quotes into the 66 and 99 form (I think it calls them
"smart quotes"). [I think it does the same with single quotes, 'Fred'.]
You can stop it doing it either by turning off the setting, or on a
one-off basis by doing an Undo (Ctrl-Z) immediately after typing the ".
I wouldn't be surprised if some web-page editing software behaves
similarly.
I tried Word ... and failed.

I was hoping I could paste into MS Word 2007 to have Word automatically
remove the curly quotes, replacing them with keyboard quotes (aka ASCII
quotes).

This article shows how to turn off the Microsoft Word default to convert
keyboard quotes into "smart quotes" (aka curly quotes).
<https://support.office.com/en-us/article/Change-curly-quotes-to-straight-quotes-and-vice-versa-017963A0-BC5F-486B-9C9D-0EC511A8FB8F>

Here's a screenshot of the keyboard-to-smart quotes GUI in my Word 2007:
<Loading Image...>

But a test cut-and-paste of the original article into MS Word failed to
convert the curly quotes to keyboard quotes (or ASCII quotes).
<http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating>

It didn't work:
<Loading Image...>

Since I don't know MS Word all that well I wonder aloud whether MS Word can
convert curly quotes to keyboard quotes.

If Microsoft Word can convert curly quotes to keyboard quotes, the issue
would be solved instantly.

Does anyone know if Word can convert those curly quotes to keyboard quotes?
<http://wetakepic.com/images/2017/10/09/smartquotes2.jpg>
Char Jackson
2017-10-09 05:27:12 UTC
Permalink
Post by harry newton
Post by J. P. Gilliver (John)
Word does that (by default - you can turn it off); if I type "Fred", it
will convert the quotes into the 66 and 99 form (I think it calls them
"smart quotes"). [I think it does the same with single quotes, 'Fred'.]
You can stop it doing it either by turning off the setting, or on a
one-off basis by doing an Undo (Ctrl-Z) immediately after typing the ".
I wouldn't be surprised if some web-page editing software behaves
similarly.
I tried Word ... and failed.
I was hoping I could paste into MS Word 2007 to have Word automatically
remove the curly quotes, replacing them with keyboard quotes (aka ASCII
quotes).
This article shows how to turn off the Microsoft Word default to convert
keyboard quotes into "smart quotes" (aka curly quotes).
<https://support.office.com/en-us/article/Change-curly-quotes-to-straight-quotes-and-vice-versa-017963A0-BC5F-486B-9C9D-0EC511A8FB8F>
<http://wetakepic.com/images/2017/10/09/smartquotes.jpg>
But a test cut-and-paste of the original article into MS Word failed to
convert the curly quotes to keyboard quotes (or ASCII quotes).
<http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating>
<http://wetakepic.com/images/2017/10/09/smartquotes2.jpg>
Since I don't know MS Word all that well I wonder aloud whether MS Word can
convert curly quotes to keyboard quotes.
If Microsoft Word can convert curly quotes to keyboard quotes, the issue
would be solved instantly.
Does anyone know if Word can convert those curly quotes to keyboard quotes?
<http://wetakepic.com/images/2017/10/09/smartquotes2.jpg>
In my limited experience, the answer is yes (configurable to be
automatic) if you're typing, and no if you're pasting. Are you seeing a
different result?

Either way, once you've pasted, it's trivial to record a macro that
converts from what you don't want to what you do want. The only issue I
take with that approach is that Word's macro recorder seldom produces
VBA code as efficient as what I can create manually, but it's usually a
great start.
harry newton
2017-10-09 07:52:56 UTC
Permalink
Post by Mayayana
Auric's solution is the most realistic. I know there
are different characters
Since my Windows default text editor is VIM, I guess I could define macros.
Like any VI user, I'm adept at global search-and-replace - but I'm not sure
how to handle the non-printing characters in that search.

I don't know what to look for but the search-and-replace command would look
something like this, where "x" below indicates the unknown character
sequence (which is probably a few keystrokes).

%s/x/"/g

Which means (offhand):
% = for the whole file
s = search each line
/ = for
x = (this is where I'll need the character sequence for a curly quote)
/ = and replace it with
" = literally the doublequote
/ = and do that
g = every time you see it in each line

Does anyone know offhand what the super secret character set is to type on
a US American English Keyboard for the curly quotes?
Jason
2017-10-07 23:06:29 UTC
Permalink
Post by harry newton
Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?
Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...
J. P. Gilliver (John)
2017-10-07 23:12:05 UTC
Permalink
Post by Jason
Post by harry newton
Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?
Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...
They (the straight quotes) preceded ASCII and probably EBCDIC by quite
some time - they're on my old Imperial typewriter ...
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

The thing about smut is it harms no one and it's rarely cruel. Besides, it's a
gleeful rejection of the dreary and the "correct".
- Alison Graham, RT 2014/10/25-31
Ken Blake
2017-10-07 23:19:19 UTC
Permalink
On Sun, 8 Oct 2017 00:12:05 +0100, "J. P. Gilliver (John)"
Post by J. P. Gilliver (John)
Post by Jason
Post by harry newton
Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?
Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...
They (the straight quotes) preceded ASCII and probably EBCDIC by quite
some time - they're on my old Imperial typewriter ...
Yes, that's because they use only a single key. Curly quotes (the
normal quotes, as Jason said) would require two keys.
harry newton
2017-10-07 23:35:14 UTC
Permalink
Post by Ken Blake
Post by J. P. Gilliver (John)
Post by Jason
Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...
They (the straight quotes) preceded ASCII and probably EBCDIC by quite
some time - they're on my old Imperial typewriter ...
Yes, that's because they use only a single key. Curly quotes (the
normal quotes, as Jason said) would require two keys.
I stand completly humbled before you as I admit that curly quotes (and all
that other sinful related stuff that isn't on a US American computer
keyboard) came first, simply because, well, typography predates computers.

OK. So curly stuff came first. I do admit that fact.
But it's not just curly quotes. There's tons more stuff in web pages that
just don't render in text. The curly quote is just the tip of the iceberg.

All I want is so simple that I can't believe anyone wouldn't want it.
I cut and paste web text into a text file as I research stuff.
I just want all the pasted text to be visible characters, not black boxes.

What program or method easily converts all that curly typography stuff to
just the character set that ASCII Windows editors like Gvim use?
Wolf K
2017-10-08 00:58:20 UTC
Permalink
On 2017-10-07 19:35, harry newton wrote:
[...]
Post by harry newton
All I want is so simple that I can't believe anyone wouldn't want it.
[...]

Well, I don't want it. I guess I'm ab/subnormal. :-)
--
Wolf K
kirkwood40.blogspot.com
"Wanted. Schrödinger’s Cat. Dead and Alive."
harry newton
2017-10-09 07:52:54 UTC
Permalink
Post by Wolf K
Post by harry newton
All I want is so simple that I can't believe anyone wouldn't want it.
[...]
Well, I don't want it. I guess I'm ab/subnormal. :-)
Fair enough.

A clarification is that that you likely don't copy and paste research
snippets into a text file.

But if you did ....

Then you'd want all that non-printible black-box stuff out of your text
files!

:)
harry newton
2017-10-07 23:30:52 UTC
Permalink
Post by Jason
Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...
Depends on which side of the fence you live on.
Has the Internet Killed Curly Quotes?
<https://www.theatlantic.com/technology/archive/2016/12/quotation-mark-wars/511766/>

But I was just using "curly quotes" as just one of maybe a dozen or more
common dastardly abominations which just don't translate into text on
Windows, as shown in this simple example from Butterick's Practical
Typography:
<https://practicaltypography.com/index.html#toc>
Where curly quotes are just one of many evils:
<https://practicaltypography.com/straight-and-curly-quotes.html>

The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.

This ponderous Microsoft Office approach might work - but I'm hoping for a
far simpler and less monotlithic solution to the basic problem that
everyone should have if they cut and paste into text from the web.
<https://support.office.com/en-us/article/Change-curly-quotes-to-straight-quotes-and-vice-versa-017963A0-BC5F-486B-9C9D-0EC511A8FB8F>
J. P. Gilliver (John)
2017-10-08 09:17:22 UTC
Permalink
In message <orbo3b$1jb2$***@gioia.aioe.org>, harry newton
<***@is.invalid> writes:
[]
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
[]
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur), but I
_do_ see the problem, and like you am surprised that no-one has written
a "simplify" utility that does what you want. (Or if they have, that
no-one has mentioned it.)

One _slight_ problem that might be encountered (though I would think it
could be overcome): non-standardisation. I've encountered - I can't say
in web pages, but I get it in emails/news posts that my elderly software
doesn't display as the originator intended. For example, let's take
those quotes you hate so much: sometimes, they've used something that my
software _does_ render as (in my case sloping rather than curly) quotes;
sometimes, they've used something that my software renders as some other
character (superscripted 2 and 3 are common); and sometimes they've used
something that just renders as a little rectangle, which is my
software's way of showing "unknown character" or something.

OK, you may say: the "simplify" utility could still handle this: it
would just have to render _all_ those possibilities into a quote
character (ASCII 34 decimal, "shifted 2" on some keyboards [as it's the
code for "2" with a bit changed - though I don't think it _is_ shifted 2
on the US layout]). But the potential problem comes when one example -
let's say font, though that's oversimplifying - uses code X for a quote
(albeit sloping/curly), but another one uses the _same_ code X for
something else: a space, say, or a ^, or %, or _. How would your
"simplify" utility know which to substitute? (But as I've said, I'm sure
it could be got over - perhaps by having it look at headers.)
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

As we journey through life, discarding baggage along the way, we should keep
an iron grip, to the very end, on the capacity for silliness. It preserves the
soul from desiccation. - Humphrey Lyttelton quoted by Barry Cryer in Radio
Times 10-16 November 2012
Ken Blake
2017-10-08 15:56:22 UTC
Permalink
On Sun, 8 Oct 2017 10:17:22 +0100, "J. P. Gilliver (John)"
Post by J. P. Gilliver (John)
[]
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
[]
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur),
Using a text editor doesn't mean you're a dinosaur. Some of us
occasionally do things like create/modify .bat files.
J. P. Gilliver (John)
2017-10-08 16:06:03 UTC
Permalink
Post by Ken Blake
On Sun, 8 Oct 2017 10:17:22 +0100, "J. P. Gilliver (John)"
Post by J. P. Gilliver (John)
[]
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
[]
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur),
Using a text editor doesn't mean you're a dinosaur. Some of us
occasionally do things like create/modify .bat files.
I said _some_ would say. I'm not one of them (-:
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
Ken Blake
2017-10-08 19:09:45 UTC
Permalink
On Sun, 8 Oct 2017 17:06:03 +0100, "J. P. Gilliver (John)"
Post by Ken Blake
On Sun, 8 Oct 2017 10:17:22 +0100, "J. P. Gilliver (John)"
Post by J. P. Gilliver (John)
[]
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
[]
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur),
Using a text editor doesn't mean you're a dinosaur. Some of us
occasionally do things like create/modify .bat files.
Yes, I know. I understood that. I was merely commenting on the views
of any of the "some" who would say that.
harry newton
2017-10-09 07:52:53 UTC
Permalink
Post by J. P. Gilliver (John)
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur), but I
_do_ see the problem, and like you am surprised that no-one has written
a "simplify" utility that does what you want. (Or if they have, that
no-one has mentioned it.)
Thanks for understanding the problem set where it would be *fantastic* if a
simple solution can be found, since *everyone* (who cuts and pastes into
text files) would want this capability to remove black box unprintable
text.
Char Jackson
2017-10-09 08:31:49 UTC
Permalink
Post by harry newton
Post by J. P. Gilliver (John)
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur), but I
_do_ see the problem, and like you am surprised that no-one has written
a "simplify" utility that does what you want. (Or if they have, that
no-one has mentioned it.)
Thanks for understanding the problem set where it would be *fantastic* if a
simple solution can be found, since *everyone* (who cuts and pastes into
text files) would want this capability to remove black box unprintable
text.
Come on now, this far into the discussion and you're still talking about
black boxes, after multiple people have pointed out that just about
every other text editor doesn't have that problem? That doesn't seem
fair.
Ken Blake
2017-10-08 15:54:14 UTC
Permalink
On Sat, 7 Oct 2017 23:30:52 +0000 (UTC), harry newton
Post by harry newton
Post by Jason
Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...
Depends on which side of the fence you live on.
Has the Internet Killed Curly Quotes?
<https://www.theatlantic.com/technology/archive/2016/12/quotation-mark-wars/511766/>
But I was just using "curly quotes" as just one of maybe a dozen or more
common dastardly abominations which just don't translate into text on
Windows, as shown in this simple example from Butterick's Practical
<https://practicaltypography.com/index.html#toc>
<https://practicaltypography.com/straight-and-curly-quotes.html>
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
"Normal text editor"? I just pasted curly quotes into Notepad to be
sure it handled curly quotes. It does.

If yours doesn't, I suggest you change your text editor.
J. P. Gilliver (John)
2017-10-08 16:10:30 UTC
Permalink
In message <***@4ax.com>, Ken Blake
<***@invalid.news.com> writes:
[]
Post by Ken Blake
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
"Normal text editor"? I just pasted curly quotes into Notepad to be
sure it handled curly quotes. It does.
If yours doesn't, I suggest you change your text editor.
The distinction is blurred. To some people, a text editor is something
that doesn't do formatting, bold, italic, underlined, fonts, etcetera
(and thus NotePad is one such); to other people, it is one that only
works with ASCII codes 32 to 126 plus newline. There _are_ places where
only the latter is valid. (Headerless usenet, for example, though ANSI
characters _usually_ get through that unaltered.)
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
Ken Blake
2017-10-08 19:14:47 UTC
Permalink
On Sun, 8 Oct 2017 17:10:30 +0100, "J. P. Gilliver (John)"
Post by J. P. Gilliver (John)
[]
Post by Ken Blake
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
"Normal text editor"? I just pasted curly quotes into Notepad to be
sure it handled curly quotes. It does.
If yours doesn't, I suggest you change your text editor.
The distinction is blurred. To some people, a text editor is something
that doesn't do formatting, bold, italic, underlined, fonts, etcetera
(and thus NotePad is one such); to other people, it is one that only
works with ASCII codes 32 to 126 plus newline. There _are_ places where
only the latter is valid. (Headerless usenet, for example, though ANSI
characters _usually_ get through that unaltered.)
To me, there are word processors (e.g. WordPerfect and Word), text
editors (e.g. Notepad) and *glorified* text editors (e, g. WordPad).

As far as I'm concerned, WordPad is a useless program. I don't need
anything in between a word processor and a text editor. Microsoft
probably provides WordPad for those people who don't want to spend the
money on a real word processor, but I think those people would be much
better off with Open Office or Libre Office.
harry newton
2017-10-09 04:18:15 UTC
Permalink
Post by Ken Blake
To me, there are word processors (e.g. WordPerfect and Word), text
editors (e.g. Notepad) and *glorified* text editors (e, g. WordPad).
As far as I'm concerned, WordPad is a useless program. I don't need
anything in between a word processor and a text editor. Microsoft
probably provides WordPad for those people who don't want to spend the
money on a real word processor, but I think those people would be much
better off with Open Office or Libre Office.
I didn't know that using the word "text editor" would be construed as an
issue, so I apologize since even MS Word, Notepad, Adobe Acrobat Pro, or
say, The GIMP or PhotoShop can all be used as an "Editor of Text".

Since almost every program edits text, I don't think we want to go down the
path of defining whether Notepad is a bona-fide "text editor" (because in
my book, it's not a real text editor but in other people's books - it is -
where I just said EVERYTHING edits text so that argument would never end).

My text editor on Windows/Linux/Android is the standard vi clone.

I did supply a plethora of screenshots to show my text editing was in vi.
<http://wetakepic.com/images/2017/10/09/smartquotes2.jpg

What I *meant* by *text editor* is "pure text" or "real text" or whatever
it is called when all that formatting junk is removed leaving only the
characters on the screen that are on the keyboard itself.

The reason I'm using GVIM on Windows is that vi allows for quick edits.
Mayayana
2017-10-09 05:09:29 UTC
Permalink
"harry newton" <***@is.invalid> wrote

| Since almost every program edits text, I don't think we want to go down
the
| path of defining whether Notepad is a bona-fide "text editor" (because in
| my book, it's not a real text editor but in other people's books - it is -
| where I just said EVERYTHING edits text so that argument would never end).
|
| My text editor on Windows/Linux/Android is the standard vi clone.
|
| What I *meant* by *text editor* is "pure text" or "real text" or whatever
| it is called when all that formatting junk is removed leaving only the
| characters on the screen that are on the keyboard itself.

Going down the path of defining text editors would
actually be helpful. The main problem you're having is
that you don't understand the distinctions.

Text editor usually refers to plain text. No font options,
color, etc. But plain text can be of different encodings.
A text editor is used to write plain text that doesn't
need formatting. It then saves that text to file as a simple
stream of bytes. Typically each byte represents a character.
Sometimes 2 bytes represent a character.

Vi/vim/gvim is not a good text editor because it's designed
to be a coding editor. Besides the horrendous UI, it doesn't
seem to handle different encodings. That's why you're having
trouble. Anything you paste is rendered as ASCII, because Vi
is assuming that you're writing code.

Wordpad is known as a "rich text" editor, which seems to
be a Microsoft invention. It's pretty much what Word started
out as. Like HTML, Wordpad RTF files are composed of
plain ASCII text with special formatting characters and
patterns to tell the RTF rendering window how to display
font, color, etc. So like HTML, rich text displays as plain
text but is not. Word DOCs are a kind of ANSI monstrosity
with unicode sprinkled in, along with a lot of gibberish to
break compatibility with copycats. DOCX is actually a
ZIP file, containing numerous files and making heavy use of
Microsoft's current darling, XML, to specify formatting. It
looks like plain text when Word renders it, but it's a whole
different thing. (Look for the next MS Office format to
use JSON. There's always a next big thing bandwagon in
the tech world that everyone feels compelled to jump onto.)

When you print text in GIMP or another graphic editor
that's usually raster graphics. In other words, it's not text
at all. It's a picture. In Notepad the byte 65 renders as A.
In GIMP an A is actually a bitmap, or part of a bitmap. Each
pixel in the A is represented by 3 bytes representing RGB
color values. So an A in GIMP might require 30,000 bytes
to render, while an A in Notepad requires one.

I find it helps to understand some of how it all works
behind the scenes in order to make sense of issues
like the one you're having with "dastardly text".
harry newton
2017-10-09 06:28:46 UTC
Permalink
Post by Mayayana
The main problem you're having is
that you don't understand the distinctions.
I don't at all disagree. All this talk of utf-8 and ascii and ANSI is
driving me nuts in that I don't have the background for understanding.

Fundamentally, I have a keyboard that has a fundamental set of characters
on it that are the same characters that I want to show up in my cut and
pastes from a web site into file that is saved with the *.txt extension in
Windows.

I don't study languages, but the Babylonians seemed to communicate
perfectly well with such an arrangement of a small set of 26 fundamental
characters.

Why can't we be "allowed" to do the same?

Specifically I don't need nor want the "dastardly" non-keyboard characters.
All I'm trying to do is get rid of them as efficiently as possible.
Post by Mayayana
Text editor usually refers to plain text. No font options,
color, etc. But plain text can be of different encodings.
I think there should be a "keyboard encoding" which is simply the
fundamental set of characters that can be typed on a typical (US American)
keyboard!
:)

If I were to ask a Babylonian, I'd bet you he'd agree, and he was the one
who designed the alphabet (or so I'm told) so he knew what he was doing!

NOTE: I just looked up to doublecheck who created the alphabet, only to
find it's more complex than I had imagined - so I'll just say Babylonians
did it (but it's not that simple):
https://en.wikipedia.org/wiki/History_of_the_alphabet
Post by Mayayana
A text editor is used to write plain text that doesn't
need formatting. It then saves that text to file as a simple
stream of bytes. Typically each byte represents a character.
Sometimes 2 bytes represent a character.
If those characters are "keyboard characters" - that's all I want.
Post by Mayayana
Vi/vim/gvim is not a good text editor because it's designed
to be a coding editor.
My fingers are permanently engrained with muscle memory because my first
text editor ever (decades ago) was "vi" so I'm most efficient in vi because
I don't use the mouse to edit text files and I never will because nobody
ever typed faster by removing their fingers from the keyboard to mess with
a mouse.
Post by Mayayana
Besides the horrendous UI, it doesn't
seem to handle different encodings. That's why you're having
trouble. Anything you paste is rendered as ASCII, because Vi
is assuming that you're writing code.
GVIM failed.
Notepad failed.
MS Word failed.

I don't think there is extent any suggested program that I've tested that
converts the "dastardly" dotardly curly quotes to straight quotes.

Of course vi can do a search and replace.
I think that will have to be the solution.

I just have to figure out how to type the curly quote when it doesn't exist
on my keyboard.

Do you know what I type below for "x" to represent the curly quote?
%s/x/"/g

That sequence in vi-speak, is:
% = for the entire file
s/x = search for "x" (where I need to know how to define a curly quote)
/"/g = and replace it with a straight quote, for every instance found

If I just knew what to put there for "x" to represent the curly quote, I'd
be done. :)
Post by Mayayana
I find it helps to understand some of how it all works
behind the scenes in order to make sense of issues
like the one you're having with "dastardly text".
I admit information overload. It's more than I can handle.
All I would ask, at this point, is one thing:

How do I tell vi to recognize a curly quote?
Char Jackson
2017-10-09 05:22:09 UTC
Permalink
Post by harry newton
The reason I'm using GVIM on Windows is that vi allows for quick edits.
I tried to use Google to find "gvim quick edits" and came up empty. What
does GVIM do that other text editors don't do? What are quick edits? I
assume those two words don't have their obvious meaning, because every
text editor does that, so what is it?
harry newton
2017-10-09 06:10:25 UTC
Permalink
Post by Char Jackson
Post by harry newton
The reason I'm using GVIM on Windows is that vi allows for quick edits.
I tried to use Google to find "gvim quick edits" and came up empty. What
does GVIM do that other text editors don't do? What are quick edits? I
assume those two words don't have their obvious meaning, because every
text editor does that, so what is it?
I apologize. My words were not clear. Hence I caused unnecessary confusion.

I appreciate that you followed the implied meaning of my words, which makes
it my fault for not explaining that "vi" is all I ever wanted. GVIM just
came with the package when I searched for the best "vi" editor on Windows.

I never use the GUI in GVIM. I never use the mouse. Both hands are on the
keyboard, because all I do with GVIM is use the "vi" part of VIM.

So what I meant by "quick edits" was my own concoction of words to describe
that fact that I can cut and paste and copy and rename and delete and
reorder, etc., all using keyboard presses.

For example, I can delete a line by typing "dd" and I can yank a line by
typing "yy" and I can paste those deleted or yanked lines ten lines lower
by typing "p" after jumping down ten lines with "10j" such that the
self-concocted phrase "quick edits" means something like this sequence:

yy10jp <== this is a "quick edit" yanking a line to place it 10 lines lower
Char Jackson
2017-10-09 06:28:53 UTC
Permalink
Post by harry newton
Post by Char Jackson
Post by harry newton
The reason I'm using GVIM on Windows is that vi allows for quick edits.
I tried to use Google to find "gvim quick edits" and came up empty. What
does GVIM do that other text editors don't do? What are quick edits? I
assume those two words don't have their obvious meaning, because every
text editor does that, so what is it?
I apologize. My words were not clear. Hence I caused unnecessary confusion.
I appreciate that you followed the implied meaning of my words, which makes
it my fault for not explaining that "vi" is all I ever wanted. GVIM just
came with the package when I searched for the best "vi" editor on Windows.
I never use the GUI in GVIM. I never use the mouse. Both hands are on the
keyboard, because all I do with GVIM is use the "vi" part of VIM.
So what I meant by "quick edits" was my own concoction of words to describe
that fact that I can cut and paste and copy and rename and delete and
reorder, etc., all using keyboard presses.
For example, I can delete a line by typing "dd" and I can yank a line by
typing "yy" and I can paste those deleted or yanked lines ten lines lower
by typing "p" after jumping down ten lines with "10j" such that the
yy10jp <== this is a "quick edit" yanking a line to place it 10 lines lower
The actual keyboard presses would be different, but can't every text
editor do all of those things? I would say yes.

Thank you for the explanation, by the way!
harry newton
2017-10-09 07:25:36 UTC
Permalink
Post by Char Jackson
Post by harry newton
yy10jp <== this is a "quick edit" yanking a line to place it 10 lines lower
The actual keyboard presses would be different, but can't every text
editor do all of those things? I would say yes.
Thank you for the explanation, by the way!
Again you are correct that I erred in not being precise in my English.

I apologize as I had typed that explanation using faulty "brain memory"
instead of the more reliable "muscle memory" that innervates my fingers.

Running a quick test case in vi on Windows, the steps to yank a line to
place it ten lines below are ... oh ... I see ... they are what I said they
were with the exception of pressing the "escape" key (depending on what
editing mode I was in at the time).

So, this will *always* work:
<Escape>yy10jp
But this will only work if I'm not in editing mode:
yy10jp

More to the point, what I want to do is globally replace those dotard curly
quotes with more politically correct keyboard quotes.

Assuming for our purpose the dastardly curly quotes are represented by the
keystroke "x", then the global search and replace sequence is:

<Escape>:%s/x/"/g

Which means, in vi speak...
<Escape> => Make sure you're in command mode (and not editing mode)
:% => run the following command on the entire file
s/x/ => search for "x" (where I need to know how to define a curly quote)
"/g => replace what you found with the double quote for all instances found

In reality I'd want to replace both "66" curly quotes (we'll call them "x")
and "99" curly quotes (we'll call them "y") with keyboard quotes, so the vi
command would be more like this (which itself can be simplified):

<Escape>:%s/x/"/g|%s/y/"/g
Where the bar "|" just separates the two editing commands on one line.

If I wish to confirm each change, I just add the "global confirm" gc:
<Escape>:%s/x/"/gc|%s/y/"/gc

Another way to search for "x" or "y" and replace with "z" is...
<Escape>:%s/[x,y]/z/g

Which, in vi-speak, interprets to:
<Escape>:%s/ => In command mode, search the whole file for
[x,y] => either "x" or "y"
/z/ => and replace what you found with "z"
g => doing so for every instance found in each line

If I just knew what to type for "x" and "y" to represent those dastardly
curly quotes, I'd be done since it doesn't seem that there is an easier
way.

I will, for the first time, search for how to represent curly quotes in a
vi command....(since this appears to be the simplest solution).
Char Jackson
2017-10-08 16:57:51 UTC
Permalink
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
The obvious answer is to use another text editor, one that doesn't have
the problems that you object to. I use and recommend Notepad++.

The other obvious approach is to write a macro, or a series of macros if
you want a more modular approach, that you can run to fix the evils that
you see with GVIM. Most text editors include macro capability, but if
yours doesn't, you mentioned having access to Word, so in a pinch you
could do it there by using VBA.

I repeat, though, the obvious answer is to use another text editor. If
Notepad++ isn't to your liking, many of my colleagues have settled on
Ultra Edit or Textpad, so you might give those a try.

Closing thought, does GVIM let you choose a better character set, one
that includes symbols for the things that are currently not able to be
displayed?
Ken Blake
2017-10-08 19:15:55 UTC
Permalink
Post by Char Jackson
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
The obvious answer is to use another text editor, one that doesn't have
the problems that you object to. I use and recommend Notepad++.
Ditto to all three of those statements.
J. P. Gilliver (John)
2017-10-09 02:18:45 UTC
Permalink
Post by Char Jackson
Post by harry newton
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.
The obvious answer is to use another text editor, one that doesn't have
the problems that you object to. I use and recommend Notepad++.
[]
Post by Char Jackson
I repeat, though, the obvious answer is to use another text editor. If
Notepad++ isn't to your liking, many of my colleagues have settled on
Ultra Edit or Textpad, so you might give those a try.
It isn't just the choice of editor: some _applications_ where you might
want to use the text only accept seven-bit characters.
Post by Char Jackson
Closing thought, does GVIM let you choose a better character set, one
that includes symbols for the things that are currently not able to be
displayed?
(I don't know GVIM, but I'm pretty sure the answer is) No. It uses 94
printable/displayable characters (or 93 if you argue that space isn't
one of them).
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

If you bate your breath do you catch a lung fish? (Glynn Greenwood 1996-8-23.)
harry newton
2017-10-09 04:24:01 UTC
Permalink
Post by Char Jackson
I repeat, though, the obvious answer is to use another text editor. If
Notepad++ isn't to your liking, many of my colleagues have settled on
Ultra Edit or Textpad, so you might give those a try.
I'm looking for a solution for the cut-and-paste problem.
I have no problem cutting and pasting into an intermediary program.

In fact, I just tried MS Word as the intermediary; it failed.
<http://wetakepic.com/images/2017/10/09/smartquotes2.jpg>

Why would NotePad++ work when MS Office failed?
Does NotePad++ have a special curly-quote-to-keyboard-quote macro?
Mayayana
2017-10-09 05:11:43 UTC
Permalink
"harry newton" <***@is.invalid> wrote

| Why would NotePad++ work when MS Office failed?

You don't need either of those. Paste into Notepad
and save as UTF-8. That option appears in a dropdown
underneath the Save As textbox, in the Save As window.
harry newton
2017-10-09 05:53:24 UTC
Permalink
| Why would NotePad+-+- work when MS Office failed?
You don't need either of those. Paste into Notepad
and save as UTF-8. That option appears in a dropdown
underneath the Save As textbox, in the Save As window.
1. I copied problematic text from an arbitrary curlyquote web site:
<https://practicaltypography.com/straight-and-curly-quotes.html>

Curly quotes are the quo+AK0-ta+AK0-tion marks used in good ty+AK0-pog+AK0-ra+AK0-phy.
There are four curly quote char+AK0-ac+AK0-ters: the open+AK0-ing sin+AK0-gle quote
(+IBg-), the clos+AK0-ing sin+AK0-gle quote (+IBk-), the open+AK0-ing dou+AK0-ble quote (+IBw-),
and the clos+AK0-ing dou+AK0-ble quote (+IB0-).

2. I pasted into a blank MS Word doc by typing <Windows+-R> fixquote <Enter>
and then <Control+-Alt+-V> "Paste Special Unformatted Unicode Text".

3. MS Word 2007 didn't have a Save as "UTF-8" but it did have "Plain Text".
And yet, the text *still* had the non-printable non-keyboard characters!
Even when I selected <US-ASCII> as the encoding format.
<Loading Image...>

4. Having not used NotePad in centuries, I pasted into Notepad this time
Start > Run > notepad <Enter>
And saved as UTF-8.
Yikes! The result was even worse!
<Loading Image...>

I think I don't understand the problem because the result of that was a
"plain text" file which contained utterly unprintable non-keyboard
characters.

What did I do wrongly?
Mike S
2017-10-09 06:01:14 UTC
Permalink
| Why would NotePad+-+- work when MS Office failed?
  You don't need either of those. Paste into Notepad
and save as UTF-8. That option appears in a dropdown
underneath the Save As textbox, in the Save As window.
  <https://practicaltypography.com/straight-and-curly-quotes.html>
  Curly quotes are the quo+AK0-ta+AK0-tion marks used in good
ty+AK0-pog+AK0-ra+AK0-phy.   There are four curly quote
char+AK0-ac+AK0-ters: the open+AK0-ing sin+AK0-gle quote   (+IBg-), the
clos+AK0-ing sin+AK0-gle quote (+IBk-), the open+AK0-ing dou+AK0-ble
quote (+IBw-),   and the clos+AK0-ing dou+AK0-ble quote (+IB0-).
2. I pasted into a blank MS Word doc by typing <Windows+-R> fixquote <Enter>
  and then <Control+-Alt+-V> "Paste Special Unformatted Unicode Text".
3. MS Word 2007 didn't have a Save as "UTF-8" but it did have "Plain Text".
  And yet, the text *still* had the non-printable non-keyboard characters!
  Even when I selected <US-ASCII> as the encoding format.
  <http://wetakepic.com/images/2017/10/09/fix3.jpg>
4. Having not used NotePad in centuries, I pasted into Notepad this time
  Start > Run > notepad <Enter>
  And saved as UTF-8.
  Yikes! The result was even worse!
  <http://wetakepic.com/images/2017/10/09/fix4.jpg>
I think I don't understand the problem because the result of that was a
"plain text" file which contained utterly unprintable non-keyboard
characters.
What did I do wrongly?
Can you make a Word doc available for d/l somewhere that has all of the
characters you want to replace with one 'bad' character, space,
replacement character pair per line?
harry newton
2017-10-09 06:55:43 UTC
Permalink
Post by Mike S
Can you make a Word doc available for d/l somewhere that has all of the
characters you want to replace with one 'bad' character, space,
replacement character pair per line?
Yes, I can. And will. And just did.
<Loading Image...>

1. This web page has the fundamental problem-pair set you ask about:
<https://practicaltypography.com/straight-and-curly-quotes.html>

2. So I created this simple sample text out of that web page:
<https://pastebin.com/BFRYNs1c>

2. I copied that sample text paste to this MS Word document:
<http://www.filedropper.com/fixquote>

3. I then copied from MS Word to the vi text editor to create this file:
<https://www.sendspace.com/file/bhgsl8>

All suffered from the same malady of having dotard curly quotes instead of
keyboard quotes. <http://wetakepic.com/images/2017/10/09/sample.jpg>
Mike S
2017-10-09 07:30:10 UTC
Permalink
Post by harry newton
Post by Mike S
Can you make a Word doc available for d/l somewhere that has all of
the characters you want to replace with one 'bad' character, space,
replacement character pair per line?
Yes, I can. And will. And just did.
<http://wetakepic.com/images/2017/10/09/sample.jpg>
  <https://practicaltypography.com/straight-and-curly-quotes.html>
  <https://pastebin.com/BFRYNs1c>
  <http://www.filedropper.com/fixquote>
  <https://www.sendspace.com/file/bhgsl8>
All suffered from the same malady of having dotard curly quotes instead of
keyboard quotes. <http://wetakepic.com/images/2017/10/09/sample.jpg>
I used the VB6 Asc() function to see what character code the two curly
quotes were, it returned character codes 147 and 148. I believe Word has
a similar VBA function, so you can check the character codes yourself e.g.

?asc("”")
?asc("“")

https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/asc-function

This worked when I pasted the text into a textbox
input:
Bad: (“)(”)
output:
Bad: (")(")

Dim s1 As String
s1 = Text1.Text
s1 = Replace(s1, Chr(147), Chr$(34))
s1 = Replace(s1, Chr(148), Chr$(34))
Text1.Text = s1

I believe you can you use the replace function to do character
substitution in an entire Word doc using VBA. This might be close to
what you need:

https://msdn.microsoft.com/en-us/library/office/aa211953(v=office.11).aspx

Is that approach useful?
Mike S
2017-10-09 07:49:25 UTC
Permalink
Post by harry newton
Post by Mike S
Can you make a Word doc available for d/l somewhere that has all of
the characters you want to replace with one 'bad' character, space,
replacement character pair per line?
Yes, I can. And will. And just did.
<http://wetakepic.com/images/2017/10/09/sample.jpg>
   <https://practicaltypography.com/straight-and-curly-quotes.html>
   <https://pastebin.com/BFRYNs1c>
   <http://www.filedropper.com/fixquote>
   <https://www.sendspace.com/file/bhgsl8>
All suffered from the same malady of having dotard curly quotes instead of
keyboard quotes. <http://wetakepic.com/images/2017/10/09/sample.jpg>
If you are thinking of trying the VBA replacement function this might be
useful:

What's That Character?

If you knew the character's numeric code, you could search for it, but
this character falls way off the usual list. How can you find its
numeric code? Put the following macro in the template of your choice
[Hack #50] , select Tools→Macro→Macros, choose WhatCharacterCode from
the list, and click the Run button:

Sub WhatCharacterCode( )
MsgBox Asc(Selection.Text)
End Sub

This macro will display the ASCII character code for the first character
in the current selection; you can then search for it using the ^0 syntax.

If the macro reports a value of 63 and fails to match the character, you
may be facing a Unicode character. The following macro will report the
Unicode code of a character, which you can search for using the ^u syntax:

Sub WhatUnicodeCharacterCode( )
MsgBox AscW(Selection.Text)
End Sub

The result displayed will be the decimal version of the Unicode
character code, not the hexadecimal version used when inserting Unicode
characters.

https://www.safaribooksonline.com/library/view/word-hacks/0596004931/ch04s05.html
harry newton
2017-10-09 09:42:43 UTC
Permalink
Post by Mike S
If you are thinking of trying the VBA replacement function this might be
What's That Character?
I just found how to figure out the non-printable character in vi!

I found out how to do that by reading these references:
<https://durgaprasad.wordpress.com/2007/09/25/find-replace-non-printable-characters-in-vim/>
<https://vi.stackexchange.com/questions/13379/how-to-interpret-ascii-codes-returned-by-ga-command>

All I needed to do in vi to figure out what the non-printing characters
were was to sidle my cursor up to the non-printable character and just
press "ga" as shown in this screenshot below.
<Loading Image...>

The "66" curly quotes showed up as ~S, ^S, [147], Hex 93, Octal 223
The "99" curly quotes showed up as ~T, ^T, [148], Hex 94, Octal 224

The funny thing is that this works fine individually:
:%s/\%x93/"/g (search for hex93 and replace it with straight quotes)
:%s/\%x94/"/g (search for hex94 and replace it with straight quotes)

But this isn't working for the whole file yet:
:%s/[\%x93,\%x94]/"/g

I'm working on why that multiple-search-and-replace syntax failed.
harry newton
2017-10-09 08:21:58 UTC
Permalink
Post by Mike S
I used the VB6 Asc() function to see what character code the two curly
quotes were, it returned character codes 147 and 148. I believe Word has
a similar VBA function, so you can check the character codes yourself e.g.
?asc("+IB0-")
?asc("+IBw-")
Thanks for stating that the "character code" for the opening & closing
curly quotes is decimal [146] and decimal [148] respectively.

I don't use Word much so I'm not at all familiar with how you did that
(even though you explained it); but all I need, in the end, is a way to
tell vi how to recognize these two characters.

Once I have the two characters figured out, it's simple to replace them:
:%s/[x,y]/"/g
Which means, in my vi-speak...
:%s/ => Search the whole file for
[x,y] => either "x" or "y"
/"/g => and replace all instances with the straight double quote.

I just have to figure out how to tell vi that "x" and "y" are digital [146]
and [148] respectively.
Post by Mike S
https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/asc-function
This worked when I pasted the text into a textbox
Bad: (+IBw-)(+IB0-)
Bad: (")(")
Dim s1 As String
s1 = Text1.Text
s1 = Replace(s1, Chr(147), Chr$(34))
s1 = Replace(s1, Chr(148), Chr$(34))
Text1.Text = s1
I believe you can you use the replace function to do character
substitution in an entire Word doc using VBA. This might be close to
https://msdn.microsoft.com/en-us/library/office/aa211953(v=office.11).aspx
Is that approach useful?
Thank you for the suggested approach.
I admit you did a lot of work, which I appreciate.

The most important part of that work is the method for identifying the
characters, since there are admittedly more dastardly characters than just
the opening and closing curly doublequotes.

To be bluntly honest and open with you, I edit plain (pure?) text files all
day every day where MS Word is too ponderous to be useful except as a hack.

In this case, you proposed a valid hack, but when I look at the amount of
code, I have to compare it to the amount of code required to do the same
thing in vi, which is a comparison of this to this:

This is the pseudocode in Word to replace x & y with z globally:
Dim s1 As String
s1 = Text1.Text
s1 = Replace(s1, Chr(x), Chr$(z))
s1 = Replace(s1, Chr(y), Chr$(z))
Text1.Text = s1
This is the pseudocode in vi to replace x & y with z globally:
:%s/[x,y]/z/g

If I'm going to run a global search and replace, I may as well do it in the
text editor that is most efficient for such things.

I was originally hoping there would be a "switch" in Word, that just did it
for me, so that I could use Word as an intermediary; but if I'm going to
run a global search and replace in Word, I may as well run that same global
search and replace in vi.

So right now, I think the only viable solution is to figure out two things,
one of which I think you've figured out for me (which I appreciate).

1. How to figure out what the character is (e.g., decimal [146] & [148])
2. How to tell vi to search for that character (replacing is easy).
Char Jackson
2017-10-09 06:25:50 UTC
Permalink
Post by harry newton
| Why would NotePad+-+- work when MS Office failed?
You don't need either of those. Paste into Notepad
and save as UTF-8. That option appears in a dropdown
underneath the Save As textbox, in the Save As window.
<https://practicaltypography.com/straight-and-curly-quotes.html>
Curly quotes are the quo+AK0-ta+AK0-tion marks used in good ty+AK0-pog+AK0-ra+AK0-phy.
There are four curly quote char+AK0-ac+AK0-ters: the open+AK0-ing sin+AK0-gle quote
(+IBg-), the clos+AK0-ing sin+AK0-gle quote (+IBk-), the open+AK0-ing dou+AK0-ble quote (+IBw-),
and the clos+AK0-ing dou+AK0-ble quote (+IB0-).
2. I pasted into a blank MS Word doc by typing <Windows+-R> fixquote <Enter>
and then <Control+-Alt+-V> "Paste Special Unformatted Unicode Text".
3. MS Word 2007 didn't have a Save as "UTF-8" but it did have "Plain Text".
And yet, the text *still* had the non-printable non-keyboard characters!
Even when I selected <US-ASCII> as the encoding format.
<http://wetakepic.com/images/2017/10/09/fix3.jpg>
4. Having not used NotePad in centuries, I pasted into Notepad this time
Start > Run > notepad <Enter>
And saved as UTF-8.
Yikes! The result was even worse!
<http://wetakepic.com/images/2017/10/09/fix4.jpg>
I think I don't understand the problem because the result of that was a
"plain text" file which contained utterly unprintable non-keyboard
characters.
What did I do wrongly?
Check the page source from that page. You won't find the text that you
copied. What I suspect is that the site owner has deliberately put
roadblocks in the way to prevent people from copying text as easily as
dragging a mouse across it.

In other words, the problem isn't only your odd choice of text editor,
it's also the pages that you choose to copy from. If you're going to
copy from a site like that, be prepared to do some post processing.
Char Jackson
2017-10-09 05:31:43 UTC
Permalink
Post by harry newton
Post by Char Jackson
I repeat, though, the obvious answer is to use another text editor. If
Notepad++ isn't to your liking, many of my colleagues have settled on
Ultra Edit or Textpad, so you might give those a try.
I'm looking for a solution for the cut-and-paste problem.
I have no problem cutting and pasting into an intermediary program.
In fact, I just tried MS Word as the intermediary; it failed.
<http://wetakepic.com/images/2017/10/09/smartquotes2.jpg>
Why would NotePad++ work when MS Office failed?
Does NotePad++ have a special curly-quote-to-keyboard-quote macro?
I copied from the sample URL that you previously provided and pasted
into Notepad++. There were no black boxes or funny characters. It's a
free program with a free download. I recommend checking it out.

Since GVIM obviously isn't working for you, I don't understand the
reluctance to find something that does.
harry newton
2017-10-09 07:52:48 UTC
Permalink
Post by Char Jackson
I copied from the sample URL that you previously provided and pasted
into Notepad++. There were no black boxes or funny characters. It's a
free program with a free download. I recommend checking it out.
I was remiss in not noting that there's no problem "seeing" the dastardly
curly quotes in all sorts of editors, including all the Microsoft editors
tested (i.e., Notepad, Wordpad, Word, etc.).

I can even see the dastard quotes in Photoshop.

The problem is that whatever we want to call "pure text" editors don't see
the quotes.

Notepad isn't even close to a "real" text editor, so if anyone proposes it,
I have to state WHY I think Notepad is brain dead in editing functionality
compared to what an actual powerful text editor does.
Post by Char Jackson
Since GVIM obviously isn't working for you, I don't understand the
reluctance to find something that does.
The *only* reason I use vi day in and day out, every second of every day,
is that the editing of text is phenomenally efficient.

No mouse needed. No GUI needed. Just pure editing of text.

For example, I can edit all the first instances of "this" to replace them
with "that" in every line of the file in a split second without moving my
hands off the keyboard (which is the main point - efficiency).
:%s/this/that

I'm sure *every* text editor can replace just the first instance in every
line of "this" with "that"; but no editor will ever do this kind of edit as
efficiency as does a good text editor such as Emacs or vi or whatever.

Notepad will *never* be in the list of good text editors when it comes to
efficiently doing the things that you do thousands of times each and every
day. It just won't. Neither will MS Word. Nor Open Office.

Good "pure" text editors exist for a reason, and that's efficiency.
Notepad is garbage for text editing. Sorry to say this so bluntly.

I can't imagine what Notepad even does, that MS Word doesn't already do,
except that it's free (sort of) and MS Word isn't free.

I didn't mean to dis Notepad; I'm just explaining that it's not efficient
as a text editor compared to vi efficiency in edits.
Char Jackson
2017-10-09 08:26:56 UTC
Permalink
Post by harry newton
Post by Char Jackson
I copied from the sample URL that you previously provided and pasted
into Notepad++. There were no black boxes or funny characters. It's a
free program with a free download. I recommend checking it out.
I was remiss in not noting that there's no problem "seeing" the dastardly
curly quotes in all sorts of editors, including all the Microsoft editors
tested (i.e., Notepad, Wordpad, Word, etc.).
I can even see the dastard quotes in Photoshop.
The problem is that whatever we want to call "pure text" editors don't see
the quotes.
Notepad isn't even close to a "real" text editor, so if anyone proposes it,
I have to state WHY I think Notepad is brain dead in editing functionality
compared to what an actual powerful text editor does.
Post by Char Jackson
Since GVIM obviously isn't working for you, I don't understand the
reluctance to find something that does.
The *only* reason I use vi day in and day out, every second of every day,
is that the editing of text is phenomenally efficient.
No mouse needed. No GUI needed. Just pure editing of text.
For example, I can edit all the first instances of "this" to replace them
with "that" in every line of the file in a split second without moving my
hands off the keyboard (which is the main point - efficiency).
:%s/this/that
I'm sure *every* text editor can replace just the first instance in every
line of "this" with "that"; but no editor will ever do this kind of edit as
efficiency as does a good text editor such as Emacs or vi or whatever.
Notepad will *never* be in the list of good text editors when it comes to
efficiently doing the things that you do thousands of times each and every
day. It just won't. Neither will MS Word. Nor Open Office.
Good "pure" text editors exist for a reason, and that's efficiency.
Notepad is garbage for text editing. Sorry to say this so bluntly.
I can't imagine what Notepad even does, that MS Word doesn't already do,
except that it's free (sort of) and MS Word isn't free.
I didn't mean to dis Notepad; I'm just explaining that it's not efficient
as a text editor compared to vi efficiency in edits.
I would only point out that no one suggested you replace your editor
with Notepad. You know that, right?

As for the rest of it, I think what you're saying is that you've fully
memorized the keyboard commands to do the tasks that you routinely want
to do, and that's fine, but it's equally fine for the rest of us to
point out that your chosen text editor is far from perfect. You've shown
us that. So what's most interesting to me is that in many other aspects
of your computing life, you go on endless quests to find the best
freeware to do a specific task, but in this case you saddle yourself
with a tool that's clearly not up to the task, but a tool that you are
intimately familiar with. Very interesting. :)

I shall continue to watch with interest. :)
harry newton
2017-10-09 09:08:30 UTC
Permalink
Post by Char Jackson
I would only point out that no one suggested you replace your editor
with Notepad. You know that, right?
At first, I was hoping that MS Word or Notepad could be used as an
"intermediary" temporary file - but even they didn't save the output in a
pure-text format, at least not in my tests of saving MS Word as
"unformatted" and my tests saving Notepad as "UTF-8" text.

As Mayayana said, you have to know a lot about encoding to figure out why,
so, I gave up on that approach once the obvious tests each failed.

The *efficient* answer is for me to simply concentrate on one thing and on
that one thing only, which is to learn how to tell vi how to recognize a
decimal [146] or decimal [148] which are apparently the encoding for the
opening and closing curly quotes.
Post by Char Jackson
As for the rest of it, I think what you're saying is that you've fully
memorized the keyboard commands to do the tasks that you routinely want
to do, and that's fine, but it's equally fine for the rest of us to
point out that your chosen text editor is far from perfect.
I don't think there is anyone on this planet who can safely assert that
common text-editing tasks aren't more efficient in the core text editors
(e.g., vi, emacs, nano, atom, etc.) than they are in the behemoth editors
(such as MS Word).

I just googled "most efficient text editors" and it seems this discussion
of which is the "most efficient" of text editors is an endless discussion -
but - the point is that Microsoft products will *never* be on those lists.

What is the most efficient TextEditor ...
https://www.quora.com/What-is-the-most-efficient-TextEditor-for-coding-and-compiling-why

One reason, of course, is that the efficient editors are born on Linux:
https://www.maketecheasier.com/linux-text-editors/
5 Best Linux Text Editors
1. Vim
2. Emacs
3. Geany
4. Gedit
5. Sublime

While that's a Linux hit (and while I'd only consider the top two), the
most important point is their first sentence: "Debates on which one is the
best have been going on for years. Everyone has an opinion; everyone has a
favorite, a certain one they absolutely swear by."

So we will just have to agree to disagree in that I assert that vi is
generally on top of the short list of the most efficient "pure text"
editors extant on Windows.

If anyone else asserts that Notepad or MS Windows (or any other behemoth
editor) is more efficient, I'm not going to argue with them for more than a
post or two - since this is an age-old debate sort of along the lines of
whether iOS or Android or Mac or Windows is better.
Post by Char Jackson
You've shown
us that. So what's most interesting to me is that in many other aspects
of your computing life, you go on endless quests to find the best
freeware to do a specific task, but in this case you saddle yourself
with a tool that's clearly not up to the task, but a tool that you are
intimately familiar with. Very interesting. :)
You are astutely correct in that I consider myself a freeware expert.
That means I find the "best" freeware for the tasks that I do most.

For me, the definition of "best" pure text editor is an editor that most
efficiently edits pure text.

One thing about me is I'm a bare-bones kind of guy.
I'm KISS if nothing else.

If someone thinks Notepad is the most efficient freeware for making
constant and complex edits to hundreds of "puretext" files a day, then
maybe Notepad has something that I don't know about.

I don't know everything there is to know about freeware.
I'm always learning.

So maybe Notepad is far more efficient than vi is.
But then why can't I find Notepad yet on any list of the most efficient
text editors (at least not in my cursory search just now)?
NotMe
2017-10-08 21:25:13 UTC
Permalink
Post by harry newton
This ponderous Microsoft Office approach might work - but I'm hoping for a
far simpler and less monotlithic solution to the basic problem that
everyone should have if they cut and paste into text from the web.
<https://support.office.com/en-us/article/Change-curly-quotes-to-straight-quotes-and-vice-versa-017963A0-BC5F-486B-9C9D-0EC511A8FB8F>
Doe not,from memory, Paste Special-Plain Text or some equivalent do that?
NotMe
2017-10-08 21:32:28 UTC
Permalink
Post by NotMe
Post by harry newton
This ponderous Microsoft Office approach might work - but I'm hoping for a
far simpler and less monotlithic solution to the basic problem that
everyone should have if they cut and paste into text from the web.
<https://support.office.com/en-us/article/Change-curly-quotes-to-straight-quotes-and-vice-versa-017963A0-BC5F-486B-9C9D-0EC511A8FB8F>
Doe not,from memory, Paste Special-Plain Text or some equivalent do that?
Should be Unformatted Text.
harry newton
2017-10-09 05:16:58 UTC
Permalink
Post by NotMe
Post by NotMe
Doe not,from memory, Paste Special-Plain Text or some equivalent do that?
Should be Unformatted Text.
Here's what I just tried to follow that suggestion as a tested solution.

1. I added a quick way to bring up a blank intermediary MS Word document.
<Loading Image...>
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\fixquote.exe
Default = C:\tmp\junk\fixquote.docx
That brings up a blank Word doc when the "fixquote" keyword is typed.
<Windows +- R>fixquote<Enter>

2. I copied the problematic text into my clipboard from any web site:
<https://practicaltypography.com/straight-and-curly-quotes.html>

Here is what I copied into my Windows clipboard (also pasted below):
<Loading Image...>

Straight quotes are the two generic ver+AK0-ti+AK0-cal quo+AK0-ta+AK0-tion marks
lo+AK0-cated near the re+AK0-turn key: the straight sin+AK0-gle quote (')
and the straight dou+AK0-ble quote (").

Curly quotes are the quo+AK0-ta+AK0-tion marks used in good ty+AK0-pog+AK0-ra+AK0-phy.
There are four curly quote char+AK0-ac+AK0-ters: the open+AK0-ing sin+AK0-gle quote
(+IBg-), the clos+AK0-ing sin+AK0-gle quote (+IBk-), the open+AK0-ing dou+AK0-ble quote (+IBw-),
and the clos+AK0-ing dou+AK0-ble quote (+IB0-).

3. I pasted my clipboard "without formatting" into that Word doc:
<Control +- Alt +- V>[x]Unformatted text or Unformatted Unicode text
Both failed:
<Loading Image...>

In summary, pasting "unformatted" text is great to remove formatting.
But it doesn't convert those dastardly twisted quotes into straight quotes.
Wolf K
2017-10-08 00:55:46 UTC
Permalink
Post by Jason
Post by harry newton
Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?
Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...
Same problem on typewriters, which is why some people think the curly
quotes aren't normal.
--
Wolf K
kirkwood40.blogspot.com
"Wanted. Schrödinger’s Cat. Dead and Alive."
Mike S
2017-10-08 05:30:46 UTC
Permalink
Post by harry newton
How can we convert those dastardly curly quotes to straight quotes on Windows?
<http://i67.tinypic.com/2h5mjbr.jpg>
I like to save into TEXT files on Windows technical information cut and
pasted from disjoint news articles where the unprintable curly quotes drive
me nuts!
<http://i67.tinypic.com/2h5mjbr.jpg>
I tried cutting from the web and pasting into MS Word and then cutting from
MS Word and pasting into the text file - but the dastardly curly quotes
were still there.
I tried using Google Gmail, pasting into a composition window and then
hitting the "Tx" format text button, and even changing the font to some
other font, but the dastardly curly quotes were still there.
Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?
Here's just one sample but the web is filled with dastardly curly quotes!
<http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating>
Looks like you may be able to do this withing Word.

How to change smart or curly quotes to straight quotes in Microsoft Word
On the File tab, click Options.
Click Proofing, and then click AutoCorrect Options.
In the AutoCorrect dialog box, do the following:
- Click the AutoFormat As You Type tab, and under Replace as you type,
select or clear the "Straight quotes" with “smart quotes” check box.
- Click the AutoFormat tab, and under Replace, select or clear the
"Straight quotes" with “smart quotes” check box.
Click OK.
https://www.windowscentral.com/change-smart-quotes-straight-quotes-microsoft-office-word-outlook-powerpoint

Change curly quotes to straight quotes and vice versa
https://support.office.com/en-us/article/Change-curly-quotes-to-straight-quotes-and-vice-versa-017963a0-bc5f-486b-9c9d-0ec511a8fb8f

Replacing smart quotes with regular quotes
https://superuser.com/questions/1054418/replacing-smart-quotes-with-regular-quotes

Does this approach do what you need?
harry newton
2017-10-09 07:52:51 UTC
Permalink
Post by Mike S
Looks like you may be able to do this withing Word.
Does this approach do what you need?
Yes.
No.

Microsoft Office has the "smarts" to eliminate all the non-standard
typography characters - so if that's the only solution - that's the only
solution.

Since I'm already editing in text (vi in my case), I was hoping for a
simpler solution than bringing up a behemoth text editor - but - if MS
Office is the *only* solution - then that is the fact.

I'm hoping for a *simpler* solution than Microsoft Office though...
Mike S
2017-10-09 07:54:28 UTC
Permalink
Post by Mike S
Looks like you may be able to do this withing Word.
Does this approach do what you need?
Yes. No.
Microsoft Office has the "smarts" to eliminate all the non-standard
typography characters - so if that's the only solution - that's the only
solution.
Since I'm already editing in text (vi in my case), I was hoping for a
simpler solution than bringing up a behemoth text editor - but - if MS
Office is the *only* solution - then that is the fact.
I'm hoping for a *simpler* solution than Microsoft Office though...
the VBA editor is actually fast and easy to use.

Sub Demo()
Application.ScreenUpdating = False
Options.AutoFormatAsYouTypeReplaceQuotes = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = True
.Text = "[“”]"
.Replacement.Text = Chr(34)
.Execute Replace:=wdReplaceAll
.Text = "[‘’]"
.Replacement.Text = Chr(39)
.Execute Replace:=wdReplaceAll
End With
End With
Options.AutoFormatAsYouTypeReplaceQuotes = True
Application.ScreenUpdating = True
End Sub

I'll leave it to you to do the VSTO adaptation.

Cheers
Paul Edstein
[MS MVP - Word]

https://social.msdn.microsoft.com/Forums/office/en-US/d0610fe3-d06c-412f-b765-117ec6f27ea6/replace-smart-curly-quotes-with-straight-quotes?for
Whiskers
2017-10-08 11:54:27 UTC
Permalink
Post by harry newton
How can we convert those dastardly curly quotes to straight quotes on Windows?
<http://i67.tinypic.com/2h5mjbr.jpg>
I like to save into TEXT files on Windows technical information cut and
pasted from disjoint news articles where the unprintable curly quotes drive
me nuts!
<http://i67.tinypic.com/2h5mjbr.jpg>
I tried cutting from the web and pasting into MS Word and then cutting from
MS Word and pasting into the text file - but the dastardly curly quotes
were still there.
I tried using Google Gmail, pasting into a composition window and then
hitting the "Tx" format text button, and even changing the font to some
other font, but the dastardly curly quotes were still there.
Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?
Here's just one sample but the web is filled with dastardly curly quotes!
<http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating>
I think the problem isn't that some quotes are curly (which is what they
should be), but that some documents and web pages are generated using
software that ignores the standard way of coding such characters - so
that a copy/paste into standard-observing software reveals the
discrepancies.

Perhaps "demoroniser - correct moronic and gratuitously incompatible
Microsoft HTML" <http://www.fourmilab.ch/webtools/demoroniser/> is what
you're looking for. That page explains the problem as understood by the
author, and offers his solution as a Perl program. I don't know if (or
how) that works on Windows systems.
--
-- ^^^^^^^^^^
-- Whiskers
-- ~~~~~~~~~~
J. P. Gilliver (John)
2017-10-08 12:47:24 UTC
Permalink
In message <***@ID-107770.user.individual.net>,
Whiskers <***@operamail.com> writes:
[]
Post by Whiskers
I think the problem isn't that some quotes are curly (which is what they
should be), but that some documents and web pages are generated using
[Who says they should (-:?]
Post by Whiskers
software that ignores the standard way of coding such characters - so
that a copy/paste into standard-observing software reveals the
discrepancies.
The OP wants to use a plain-text editor that only uses standard ASCII
(not "extended ASCII", or codes - i. e. characters between 32 and 126
decimal [plus newline]). He hasn't said why yet, but I understand what
he wants. (I was going to say "... like Notepad", but Notepad does allow
so-called "Extended ASCII", i. e. one particular set of the codes up to
255.) He is hoping for something that will render such text into
nearest-equivalent (such as quotes that have directional qualities all
into code 34 decimal).
[]
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

"In the _car_-park? What are you doing there?" "Parking cars, what else does
one
do in a car-park?" (First series, fit the fifth.)
Wolf K
2017-10-08 15:22:18 UTC
Permalink
Post by J. P. Gilliver (John)
[]
Post by Whiskers
I think the problem isn't that some quotes are curly (which is what they
should be), but that some documents and web pages are generated using
[Who says they should (-:?]
Post by Whiskers
software that ignores the standard way of coding such characters - so
that a copy/paste into standard-observing software reveals the
discrepancies.
The OP wants to use a plain-text editor that only uses standard ASCII
(not "extended ASCII", or codes - i. e. characters between 32 and 126
decimal [plus newline]). He hasn't said why yet, but I understand what
he wants. (I was going to say "... like Notepad", but Notepad does allow
so-called "Extended ASCII", i. e. one particular set of the codes up to
255.) He is hoping for something that will render such text into
nearest-equivalent (such as quotes that have directional qualities all
into code 34 decimal).
[]
NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255.

Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly
quotes.

As I recall it, characters 128 to 255 used to be called "extended
ASCII", way back in the Dark Ages, when people could write printer
drivers by creating a list of escape codes and two- or three-byte codes
that described the dot pattern in the glyph matrix....

Another mess of esoteric useless knowledge.
--
Wolf K
kirkwood40.blogspot.com
"Wanted. Schrödinger’s Cat. Dead and Alive."
J. P. Gilliver (John)
2017-10-08 16:01:35 UTC
Permalink
In message <NorCB.42035$***@fx18.iad>, Wolf K
<***@sympatico.ca> writes:
[]
Post by Wolf K
NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255.
Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly
quotes.
As I recall it, characters 128 to 255 used to be called "extended
ASCII", way back in the Dark Ages, when people could write printer
Though that name for them gained wide circulation (and I sometimes use
it), I did read somewhere that it was never an official designation.
Post by Wolf K
drivers by creating a list of escape codes and two- or three-byte codes
that described the dot pattern in the glyph matrix....
Another mess of esoteric useless knowledge.
Hi from a fellow dinosaur ... (-:
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
Mayayana
2017-10-08 16:06:53 UTC
Permalink
"Wolf K" <***@sympatico.ca> wrote

| > The OP wants to use a plain-text editor that only uses standard ASCII
| > (not "extended ASCII", or codes - i. e. characters between 32 and 126
| > decimal [plus newline]). He hasn't said why yet, but I understand what
| > he wants. (I was going to say "... like Notepad", but Notepad does allow
| > so-called "Extended ASCII", i. e. one particular set of the codes up to
| > 255.) He is hoping for something that will render such text into
| > nearest-equivalent (such as quotes that have directional qualities all
| > into code 34 decimal).
| > []
|
| NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255.
|
| Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly
| quotes.
|

171 and 187 are double chevrons. That's not the same
as curly quotes. You're thinking of 147 and 148.

But that's true only for the standard English codepage.
If you're Russian you'll see Cyrillic characters. Something
like a capital Y and an oval with a vertical line through it.

ASCII is standard in all uses and matches the same
numbers in unicode. It specifies a basic western character
set for byte values 0 to 127. ANSI uses a "local codepage" to
define characters 128-255, while retaining the ASCII values
up to 127. The standard webpage encoding used to be
Windows English codepage ANSI. (ASCII in most cases.)
Now UTF-8 is more common.

UTF-8 is a way to express unicode using single bytes.
Unicode-16, what's usually just referred to as unicode,
encodes thousands of characters in 2 bytes, so each character
can have its own specific encoding number in order to fit
English, Russian and everything else. ASCII and ANSI use
a one-byte-per-character encoding, except with a few
Asian languages.

In order to internationalize the Web with minimal upset,
UTF-8 became standard. It allows for encoding unicode 16
in a one-byte system. The first 128 values are still ASCII.
The second 128 are used to create values with up to 4
bytes. Thus all languages can be encoded in one system.
It's still read 1 byte at a time and most webpages don't
change because most are still basically ASCII. (Whereas
if we'd converted to unicode, all webpage files would
have had to be converted to 2-byte encoding,making for
a lot of work and doubling the size of HTML files.)

The problem comes when UTF-8 is read as ANSI. (Most
text is still handled in one-byte-per-character ASCII/ANSI
encoding. Even things like JPG EXIF tags and PE file
import headers are ACSII/ANSI.)
There might be, say, 3 characters in UTF-8 that
indicate a left curly quote. I don't know exactly offhand.
But it might be, say, capital A with an umlaut, a 1/4 sign,
and a Euro sign. In the browser it's a left curly quote. In
Notepad it shows up as 3 wacky characters. The two
programs are interpreting the bytes by different standards.
So the text is corrupted. And that's just in English. A
browser reading the UTF-8 can display it properly and in
most cases will "sniff" the page to identify it even if the
HTML code does not specify. But when that single-byte
text is pasted to ANSI you see the ANSI characters. You
might see the Euro. A Russian will see something else. A
Greek will see a third thing.

What Harry is asking for is a simple way to convert
UTF-8 to ANSI using the standard English codepage. That
requires converting the string by parsing
the bytes. When the parser encounters bytes of 127+
it would need to decide how to treat them. Is it an
ANSI bullet, character 160 in English? Or is byte 160
the first of 2, 3, or 4 bytes, together indicating a character
in UTF-8? If it turns out to be, say, 3 bytes that render a
left curly quote in UTF-8, some kind of filter has to recognize
that exact pattern and say, "Oh, that's a quote. We'll just
substitute character 34 for those 3 bytes."
So Harry's solution has to treat each specific UTF-8
character and decide what to substitute. It's not a 1-to-1
correspondence. In other words, Notepad already translated
the UTF-8 to ANSI, but now it has to be transliterated.

If those quotes were written as character 34 in the first
place then the encoding would not matter. Everyone would
see ", because " is in the ASCII range.

Whiskers made an interesting point that I wasn't aware of:
The page he links says that MS Office products have an option
for fancy characters like curly quotes. Maybe that helps explain
why so many of them are on wepages. MS Office users are
among the most parochial of all computer users. They're usually
not tech-literate but are computer-literate. The result is millions
of people who equate their computer with MS Office and
assume the whole world also uses MS Office. They're the people
who send emails from Word or send a 60,000 byte DOC file to
communicate 1 sentence of 24 bytes. Many of those same people
are probably also creating webpage from MS Word, oblivious
of the travesty.
Andy Burns
2017-10-08 16:11:10 UTC
Permalink
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
Char Jackson
2017-10-08 17:32:20 UTC
Permalink
Post by Andy Burns
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
I was working with a customer about a year ago, helping him edit the
config file for a piece of his networking gear. He wanted to add a
comment, which in that case is signified by a line starting with the "#"
symbol.

I asked him to type a pound sign. He paused, scanning his keyboard
unsuccessfully, so I helpfully added, "Shift-3". He said, "Oh! You mean
a hashtag!"

Millennials... Thanks, Twitter!
Wolf K
2017-10-08 17:53:20 UTC
Permalink
Post by Char Jackson
Post by Andy Burns
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
I was working with a customer about a year ago, helping him edit the
config file for a piece of his networking gear. He wanted to add a
comment, which in that case is signified by a line starting with the "#"
symbol.
I asked him to type a pound sign. He paused, scanning his keyboard
unsuccessfully, so I helpfully added, "Shift-3". He said, "Oh! You mean
a hashtag!"
Millennials... Thanks, Twitter!
# as "pound sign" is engineering usage. Learned it 61 years ago.... Also
used kip to mean 1,000 lbs.

BTW, robo-instrictions to "enter account number" usually continue with
"... and the pound sign."
--
Wolf K
kirkwood40.blogspot.com
"Wanted. Schrödinger’s Cat. Dead and Alive."
Paul
2017-10-08 18:16:48 UTC
Permalink
Post by Wolf K
Post by Char Jackson
Post by Andy Burns
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
I was working with a customer about a year ago, helping him edit the
config file for a piece of his networking gear. He wanted to add a
comment, which in that case is signified by a line starting with the "#"
symbol.
I asked him to type a pound sign. He paused, scanning his keyboard
unsuccessfully, so I helpfully added, "Shift-3". He said, "Oh! You mean
a hashtag!"
Millennials... Thanks, Twitter!
# as "pound sign" is engineering usage. Learned it 61 years ago.... Also
used kip to mean 1,000 lbs.
BTW, robo-instrictions to "enter account number" usually continue with
"... and the pound sign."
Wikipedia files that symbol under "Number Sign".

https://en.wikipedia.org/wiki/Number_sign

I'd tried a search on Octothorpe, and ended up there.

Paul
J. P. Gilliver (John)
2017-10-09 02:10:29 UTC
Permalink
Post by Wolf K
Post by Char Jackson
Post by Andy Burns
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
I was working with a customer about a year ago, helping him edit the
config file for a piece of his networking gear. He wanted to add a
comment, which in that case is signified by a line starting with the "#"
symbol.
I asked him to type a pound sign. He paused, scanning his keyboard
unsuccessfully, so I helpfully added, "Shift-3". He said, "Oh! You mean
a hashtag!"
Millennials... Thanks, Twitter!
# as "pound sign" is engineering usage. Learned it 61 years ago....
Also used kip to mean 1,000 lbs.
BTW, robo-instrictions to "enter account number" usually continue with
"... and the pound sign."
Not here. "Press the hash key".
I think the normal UK name for the # character is just "hash". (I think
hashtag comes from that: it's a tag which consists of the hash character
followed by some other characters.)
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

If you bate your breath do you catch a lung fish? (Glynn Greenwood 1996-8-23.)
Whiskers
2017-10-09 09:58:02 UTC
Permalink
Post by J. P. Gilliver (John)
Post by Wolf K
Post by Char Jackson
Post by Andy Burns
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
I was working with a customer about a year ago, helping him edit the
config file for a piece of his networking gear. He wanted to add a
comment, which in that case is signified by a line starting with the "#"
symbol.
I asked him to type a pound sign. He paused, scanning his keyboard
unsuccessfully, so I helpfully added, "Shift-3". He said, "Oh! You mean
a hashtag!"
Millennials... Thanks, Twitter!
# as "pound sign" is engineering usage. Learned it 61 years ago....
Also used kip to mean 1,000 lbs.
BTW, robo-instrictions to "enter account number" usually continue with
"... and the pound sign."
Not here. "Press the hash key".
I think the normal UK name for the # character is just "hash". (I think
hashtag comes from that: it's a tag which consists of the hash character
followed by some other characters.)
The old state-owned British telephone company called the # sign 'gate'.
--
-- ^^^^^^^^^^
-- Whiskers
-- ~~~~~~~~~~
Peter Moylan
2017-10-09 04:32:48 UTC
Permalink
Post by Wolf K
# as "pound sign" is engineering usage. Learned it 61 years ago.... Also
used kip to mean 1,000 lbs.
US engineering use, I presume. Australian engineers would never call the
hash character a pound sign.
--
Peter Moylan http://www.pmoylan.org
Newcastle, NSW, Australia
Mayayana
2017-10-08 18:45:15 UTC
Permalink
"Andy Burns" <***@andyburns.uk> wrote

| > ASCII is standard in all uses
|
| Except when UK users want a pound sign £ and get a hash symbol # (yes I
| realise Americans may call that a pound sign)

But isn't your pound sign encoded in the ANSI 128+
range?

I don't say pound for #. It's used in things like price
signs on produce sometimes and people recognize
it in context as meaning pound, but I call it a hash
sign. Microsoft, with their maddening habit of misusing
language in marketing, hijacked it to mean "sharp".
Of course in music it means that, but they named
a programming language C# and then insisted it must
be pronounced "C sharp". It's a sort of passive
aggressive way of forcing people to describe the
language as superior. A play on C++.

That reminds me of a comedian I once saw talking
about pretensious use of language. He was complaining
about a flash-in-the-pan musical group named Sade, but
pronounced Shah-DAY: "Shah-DAY. Give me a break.
S-A-D-E doesn't spell Shah-DAY. I spell my name
D-A-V-E, but I don't pronounce it "Bob". :)
Andy Burns
2017-10-08 18:59:55 UTC
Permalink
But isn't your pound sign encoded in the ANSI 128+ range?
It is, but back in the early 80's it was pretty common for printers to
have a DIP switch to flick between US and UK mode, so that ASCII code 35
printed a £ instead of a #
J. P. Gilliver (John)
2017-10-09 02:13:22 UTC
Permalink
Post by Andy Burns
But isn't your pound sign encoded in the ANSI 128+ range?
It is, but back in the early 80's it was pretty common for printers to
have a DIP switch to flick between US and UK mode, so that ASCII code
35 printed a £ instead of a #
I think I also vaguely remember some printers being settable to print
the pound sign (by which I do _not_ mean #) when you told them to print
a $. (I don't think that was ever very popular, because among other
things $ was used a lot in programming languages then, and listings with
a lot of pound signs in them were confusing to say the least.)
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

If you bate your breath do you catch a lung fish? (Glynn Greenwood 1996-8-23.)
Mayayana
2017-10-09 04:49:34 UTC
Permalink
"J. P. Gilliver (John)" <G6JPG-***@255soft.uk> wrote

| I think I also vaguely remember some printers being settable to print
| the pound sign (by which I do _not_ mean #) when you told them to print
| a $. (I don't think that was ever very popular, because among other
| things $ was used a lot in programming languages then, and listings with
| a lot of pound signs in them were confusing to say the least.)

That's an interesting point. As far as I know all programming
languages are still in American English. I have the luxury of not
noticing until I download sample code from someone foreign
and see all the function names and keywords in English, but
with incomprehensible variable names. Like this snippet:

For i = 1 To MengeZeilen
ReDim Zeilenbuffer(ZL - 1)
CopyMemory Zeilenbuffer(0), Buffer(Bufferstand + 1), ZL

Seeing that I realize how hard it must be for foreigners
to learn programming. The language of the variable names
is apparently German, which makes it very difficult for me
to figure out what the code is doing, even though I understand
the function names, keywords and operators.
Jerry Friedman
2017-10-08 19:47:43 UTC
Permalink
Post by Mayayana
| > ASCII is standard in all uses
|
| Except when UK users want a pound sign £ and get a hash symbol # (yes I
| realise Americans may call that a pound sign)
But isn't your pound sign encoded in the ANSI 128+
range?
I don't say pound for #. It's used in things like price
signs on produce sometimes and people recognize
it in context as meaning pound, but I call it a hash
sign. Microsoft, with their maddening habit of misusing
language in marketing, hijacked it to mean "sharp".
Of course in music it means that, but they named
a programming language C# and then insisted it must
be pronounced "C sharp". It's a sort of passive
aggressive way of forcing people to describe the
language as superior. A play on C++.
That reminds me of a comedian I once saw talking
about pretensious use of language. He was complaining
about a flash-in-the-pan musical group named Sade, but
pronounced Shah-DAY: "Shah-DAY. Give me a break.
S-A-D-E doesn't spell Shah-DAY.
I'm not especially fond of Sade, but they were hardly a flash in the
pan. And that's really how to pronounce Sade Adu's name in Yoruba.
Post by Mayayana
I spell my name
D-A-V-E, but I don't pronounce it "Bob". :)
He gets points for not saying "Da-VAY".
--
Jerry Friedman
Jack Campin
2017-10-08 19:57:58 UTC
Permalink
Post by Mayayana
That reminds me of a comedian I once saw talking
about pretensious use of language. He was complaining
about a flash-in-the-pan musical group named Sade, but
pronounced Shah-DAY: "Shah-DAY. Give me a break.
S-A-D-E doesn't spell Shah-DAY. I spell my name
D-A-V-E, but I don't pronounce it "Bob". :)
Sade (pronounced Sha-day) is a Nigerian-born English singer who
pronounces her name that way because that's how you say it in
Yoruba, which is her father's native language. The band was
named after her.

https://en.wikipedia.org/wiki/Sade_%28singer%29

[Followups set to aue]

-----------------------------------------------------------------------------
e m a i l : j a c k @ c a m p i n . m e . u k
Jack Campin, 11 Third Street, Newtongrange, Midlothian EH22 4PU, Scotland
mobile 07895 860 060 <http://www.campin.me.uk> Twitter: JackCampin
Mayayana
2017-10-08 20:30:35 UTC
Permalink
"Jack Campin" <***@purr.demon.co.uk> wrote

| Sade (pronounced Sha-day) is a Nigerian-born English singer who
| pronounces her name that way because that's how you say it in
| Yoruba, which is her father's native language. The band was
| named after her.
|

Yes, I gathered that. Though the link says her name
is actually Helen and Sade is a nickname based on part
of her middle name.

Whether or not the name is "authentic", the whole
presentation fit and I could see why the comedian
made the joke. The topic was excessive valorizing
of language, especially for marketing purposes. The
way I saw it, from their videos, the band Sade was
marketing a sexy, soulful, stylish, moodiness. The kind
of smoky, elegant swank one might like for background
music in an upscale bar. Hot jazz. Passion. Living to the
hilt. But marketing was arguably all they achieved. (As
does hot jazz, for that matter.) There was an over-the-
top feel to it and the vocalist was not notably sexy
nor soulful.
The name, then, seemed to fit the strategy, thus
being easy pickings for a comedian. That, of course,
is only my musically-untrained personal opinion. I didn't
mean to upset Sade fans. But even if you loved their
music wouldn't you agree they were peddling swank?
I don't imagine for a moment that the French-ish hint
of elegance that "ShahDAY" conveys in English, and
the secondary association with Marquis de Sade, went
unrecognized when they were naming the band. It's
exotic. In other words, if the lead vocalist were in the
habit of using her first name, Helen, something tells
me they wouldn't have named the band "Helen"....
Though I suppose they still could have pronounced
that ShahDAY. :)
Peter Duncanson [BrE]
2017-10-08 20:35:00 UTC
Permalink
Post by Mayayana
| > ASCII is standard in all uses
|
| Except when UK users want a pound sign àand get a hash symbol # (yes I
| realise Americans may call that a pound sign)
But isn't your pound sign encoded in the ANSI 128+
range?
I don't say pound for #. It's used in things like price
signs on produce sometimes and people recognize
it in context as meaning pound, but I call it a hash
sign. Microsoft, with their maddening habit of misusing
language in marketing, hijacked it to mean "sharp".
Of course in music it means that, but they named
a programming language C# and then insisted it must
be pronounced "C sharp". It's a sort of passive
aggressive way of forcing people to describe the
language as superior. A play on C++.
That reminds me of a comedian I once saw talking
about pretensious use of language. He was complaining
about a flash-in-the-pan musical group named Sade, but
pronounced Shah-DAY: "Shah-DAY. Give me a break.
S-A-D-E doesn't spell Shah-DAY. I spell my name
D-A-V-E, but I don't pronounce it "Bob". :)
Well of course not. D-A-V-E should be pronounced "Dah-VAY".

The band Sade is a bit more than a flash-in-the pan. It won four
Grammys. It can sometimes be difficult to work out whether a mention of
"Sade" refers to the band or its singer Sade from whom it gets its name.
--
Peter Duncanson, UK
(in alt.usage.english)
Peter Moylan
2017-10-09 04:44:47 UTC
Permalink
I don't say pound for #. It's used in things like price signs on
produce sometimes and people recognize it in context as meaning
pound, but I call it a hash sign. Microsoft, with their maddening
habit of misusing language in marketing, hijacked it to mean
"sharp". Of course in music it means that, but they named a
programming language C# and then insisted it must be pronounced "C
sharp".
The musical sharp sign (♯) looks a lot like the hash sign, but it's not
quite the same shape. Here they are side by side: ♯#

So that programming language is really "C hash", perhaps to suggest that
they made a hash of the design.
It's a sort of passive aggressive way of forcing people to describe
the language as superior. A play on C++.
Or D-.
--
Peter Moylan http://www.pmoylan.org
Newcastle, NSW, Australia
Ken Blake
2017-10-08 19:18:07 UTC
Permalink
Post by Andy Burns
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
Because of the potential confusion, I always prefer to call it a
"number sign."
Stefan Ram
2017-10-08 19:22:43 UTC
Permalink
Post by Ken Blake
Post by Andy Burns
Post by Mayayana
ASCII is standard in all uses
Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
Because of the potential confusion, I always prefer to call it a
"number sign."
"Number sign" also happens to be the designation for
this character ("#") in both ASCII 1968 and Unicode.

That's the reason I use "number sign", too.
J. P. Gilliver (John)
2017-10-08 16:28:21 UTC
Permalink
In message <ordigd$82l$***@gioia.aioe.org>, Mayayana
<***@invalid.nospam> writes:
[]
Post by Mayayana
ASCII is standard in all uses and matches the same
numbers in unicode. It specifies a basic western character
set for byte values 0 to 127. ANSI uses a "local codepage" to
If we're strictly talking about _characters_, it's 32 to 126 (-:
[]
Post by Mayayana
In order to internationalize the Web with minimal upset,
UTF-8 became standard. It allows for encoding unicode 16
in a one-byte system. The first 128 values are still ASCII.
The second 128 are used to create values with up to 4
bytes. Thus all languages can be encoded in one system.
How does the receiving (and thus decoding) software know whether it's 2,
3, or 4 bytes - are three of the 128 beyond 127 reserved as meaning the
next one, two, or three bytes are part of the same character (somewhat
like, but also unlike, the shift characters in Baudot 5-bit code)?
[]
Post by Mayayana
What Harry is asking for is a simple way to convert
UTF-8 to ANSI using the standard English codepage. That
requires converting the string by parsing
the bytes. When the parser encounters bytes of 127+
it would need to decide how to treat them. Is it an
ANSI bullet, character 160 in English? Or is byte 160
the first of 2, 3, or 4 bytes, together indicating a character
in UTF-8? If it turns out to be, say, 3 bytes that render a
left curly quote in UTF-8, some kind of filter has to recognize
that exact pattern and say, "Oh, that's a quote. We'll just
substitute character 34 for those 3 bytes."
So Harry's solution has to treat each specific UTF-8
character and decide what to substitute. It's not a 1-to-1
correspondence. In other words, Notepad already translated
the UTF-8 to ANSI, but now it has to be transliterated.
Yes, it's not _that_ simple, though a many-to-1 (well, to-94) mapping
ought not to be impossible.
Post by Mayayana
If those quotes were written as character 34 in the first
place then the encoding would not matter. Everyone would
see ", because " is in the ASCII range.
The page he links says that MS Office products have an option
for fancy characters like curly quotes. Maybe that helps explain
why so many of them are on wepages. MS Office users are
among the most parochial of all computer users. They're usually
not tech-literate but are computer-literate. The result is millions
of people who equate their computer with MS Office and
assume the whole world also uses MS Office. They're the people
who send emails from Word or send a 60,000 byte DOC file to
communicate 1 sentence of 24 bytes. Many of those same people
are probably also creating webpage from MS Word, oblivious
of the travesty.
Yes, that's probably the cause.
4
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
Mayayana
2017-10-08 16:52:21 UTC
Permalink
"J. P. Gilliver (John)" <G6JPG-***@255soft.uk> wrote

| > ASCII is standard in all uses and matches the same
| >numbers in unicode. It specifies a basic western character
| >set for byte values 0 to 127. ANSI uses a "local codepage" to
|
| If we're strictly talking about _characters_, it's 32 to 126 (-:

Ms. Line Return and Mr. Null might take offense to that. :)
In parsing they're all characters. Even a null. I sometimes
use character 1 as a marker in text programmatically
because it's neutral. It means nothing in English, formatting,
etc but it still acts as a character.

| How does the receiving (and thus decoding) software know whether it's 2,
| 3, or 4 bytes - are three of the 128 beyond 127 reserved as meaning the
| next one, two, or three bytes are part of the same character (somewhat
| like, but also unlike, the shift characters in Baudot 5-bit code)?

There are webpages about that. I actually wrote some
VBScript code for it awhile back, but now I've forgotten.
It's a pain in the neck. :)

http://www.jsware.net/jsware/scrfiles.php5#u2a
Wolf K
2017-10-08 17:48:53 UTC
Permalink
Post by Mayayana
| > The OP wants to use a plain-text editor that only uses standard ASCII
| > (not "extended ASCII", or codes - i. e. characters between 32 and 126
| > decimal [plus newline]). He hasn't said why yet, but I understand what
| > he wants. (I was going to say "... like Notepad", but Notepad does allow
| > so-called "Extended ASCII", i. e. one particular set of the codes up to
| > 255.) He is hoping for something that will render such text into
| > nearest-equivalent (such as quotes that have directional qualities all
| > into code 34 decimal).
| > []
|
| NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255.
|
| Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly
| quotes.
|
171 and 187 are double chevrons. That's not the same
as curly quotes. You're thinking of 147 and 148.
[...]

Thanks for the details.
--
Wolf K
kirkwood40.blogspot.com
"Wanted. Schrödinger’s Cat. Dead and Alive."
Peter Moylan
2017-10-09 04:56:22 UTC
Permalink
Post by Mayayana
UTF-8 is a way to express unicode using single bytes.
Unicode-16, what's usually just referred to as unicode,
encodes thousands of characters in 2 bytes, so each character
can have its own specific encoding number in order to fit
English, Russian and everything else. ASCII and ANSI use
a one-byte-per-character encoding, except with a few
Asian languages.
Minor correction: Unicode has more than 2^16 code points, so you can't
express it using two bytes per character. (For full coverage you'd need
a 21-bit code.) An earlier version of Unicode did allow a 16-bit
representation, but that's now obsolete because it didn't cover the
characters of some languages.

There is an encoding of Unicode, called UTF-16, that can represent many
characters as single 16-bit codes, but for some characters it has to go
to using a sequence of two or more 16-bit "words". In that respect it is
similar to UTF-8, which also uses variable-length encoding.

UTF-16 is a very poor choice for most Western languages, because for
typical text it uses almost twice the space as UTF-8 does. It is,
however, a good choice for some Asian languages.

There is also an encoding called UTF-32, which is grossly inefficient
but which has the advantage that it doesn't need to be variable-length.
--
Peter Moylan http://www.pmoylan.org
Newcastle, NSW, Australia
Mayayana
2017-10-09 05:26:05 UTC
Permalink
"Peter Moylan" <***@pmoylan.org.invalid> wrote

| > UTF-8 is a way to express unicode using single bytes.
| > Unicode-16, what's usually just referred to as unicode,
| > encodes thousands of characters in 2 bytes, so each character
| > can have its own specific encoding number in order to fit
| > English, Russian and everything else. ASCII and ANSI use
| > a one-byte-per-character encoding, except with a few
| > Asian languages.
|
| Minor correction: Unicode has more than 2^16 code points, so you can't
| express it using two bytes per character. (For full coverage you'd need
| a 21-bit code.) An earlier version of Unicode did allow a 16-bit
| representation, but that's now obsolete because it didn't cover the
| characters of some languages.
|
I stand by my statement. It's true that there are various
unicode versions, but the only ones that anyone is likely to
encounter are UTF-8 in webpages and unicode-16 in computer
files. Windows runs on unicode-16, which is not UTF-16 but
rather a 2-byte-per-character system. It's hardly obsolete.
So while the bytes representing ABC would be 65-66-67 in
ASCII/ANSI/UTF-8, and probably UTF-16, they're
00-65-00-66-00-67 in unicode.
If you look at any Windows DLL in a hex editor you
can see samples of both ASCII/ANSI and unicode. Things
like the import and export tables are in ASCII, while version
info strings are stored as unicode. Unicode is also the
standard format for strings in Windows programming.
Peter Moylan
2017-10-09 05:07:48 UTC
Permalink
The result is millions of people who equate their computer with MS
Office and assume the whole world also uses MS Office. They're the
people who send emails from Word or send a 60,000 byte DOC file to
communicate 1 sentence of 24 bytes.
One of our previous university Vice-Chancellors used to send out "all
staff" memos using MS-Word. There were times when his one-paragraph memo
became a 2 MB e-mail, sent to over a thousand recipients. With the
hardware of the day that probably put significant stress on the mail server.

Why as much as 2 MB for a short memo? Because apparently he didn't know
how to create a new MS-Word document, so he would take an existing
document, delete its content, and then type the new text. What he didn't
realise was that, with an unfortunate choice of options (it might even
be the default choice), the document contains a record of all changes.
This became obvious to those of us who didn't have MS-Word installed and
had to read the raw data.

What also became obvious was that he was leaking some very confidential
data.
--
Peter Moylan http://www.pmoylan.org
Newcastle, NSW, Australia
Mayayana
2017-10-09 05:29:50 UTC
Permalink
"Peter Moylan" <***@pmoylan.org.invalid> wrote

| Why as much as 2 MB for a short memo? Because apparently he didn't know
| how to create a new MS-Word document, so he would take an existing
| document, delete its content, and then type the new text. What he didn't
| realise was that, with an unfortunate choice of options (it might even
| be the default choice), the document contains a record of all changes.
| This became obvious to those of us who didn't have MS-Word installed and
| had to read the raw data.
|
| What also became obvious was that he was leaking some very confidential
| data.

That seems to be a lesson that hasn't been learned
many times. I consider it a bug in MS Word that things
like user names and changes are stored. But I guess it
makes sense in the context of most Word users being
corporate employees who are using Word for work and
not for their own purposes.
If I remember correctly, that's also how the creator
of the Melissa virus
got caught. He was just an office worker with enough
knowledge of VBScript to make a mischievous macro.
What he didn't realize was that his name was embedded
in the file.
harry newton
2017-10-09 08:39:07 UTC
Permalink
Post by Mayayana
If I remember correctly, that's also how the creator
of the Melissa virus got caught.
Yikes. Is my name in the MS Word document I already uploaded?
<http://www.filedropper.com/fixquote>

Anyway, I just now looked up how the Melissa-Virus guy got caught:
"Melissa virus turns 10"
<https://www.cnet.com/news/melissa-virus-turns-10/>

Salient points:
* March 26, 1999
* They scanned Usenet articles for viruses (they still do!)
* The Melissa-virus was posted as a zipped Word doc to alt.sex
* They tracked the NNTP posting host to an AOL account
* AOL provided the FBI the dial-in logs giving the phone number
* The account was compromised; the phone tracked to New Jersey
* The phone account owner was David L. Smith
* The article says "David L. Smith pleaded guilty" (plead? guilty)

It didn't say anything about the person's name being in the file.
Even so, your point is still valid that a name can be in the file.

Is my name in my Word file?
David E. Ross
2017-10-08 15:39:28 UTC
Permalink
Post by harry newton
How can we convert those dastardly curly quotes to straight quotes on Windows?
<http://i67.tinypic.com/2h5mjbr.jpg>
I like to save into TEXT files on Windows technical information cut and
pasted from disjoint news articles where the unprintable curly quotes drive
me nuts!
<http://i67.tinypic.com/2h5mjbr.jpg>
I tried cutting from the web and pasting into MS Word and then cutting from
MS Word and pasting into the text file - but the dastardly curly quotes
were still there.
I tried using Google Gmail, pasting into a composition window and then
hitting the "Tx" format text button, and even changing the font to some
other font, but the dastardly curly quotes were still there.
Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?
Here's just one sample but the web is filled with dastardly curly quotes!
<http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating>
See <http://www.fourmilab.ch/webtools/demoroniser/>. This is a tool
that supposedly converts Microsoft "smart" characters to HTML-compatible
characters. Yes, it is 14 years old; and no, I have not tried it myself.
--
David E. Ross
<http://www.rossde.com/>

By allowing employers to eliminate coverage for birth control
from their insurance plans, President Trump has guaranteed there
will be an increase in the demand for abortions.
Peter Moylan
2017-10-09 05:27:24 UTC
Permalink
See<http://www.fourmilab.ch/webtools/demoroniser/>. This is a tool
that supposedly converts Microsoft "smart" characters to HTML-compatible
characters. Yes, it is 14 years old; and no, I have not tried it myself.
There's nothing wrong with using old software; it doesn't go rusty, or
anything like that. You don't even have to dust it.

The other day I had to add a feature to my FTP server where a list of
users could be sorted by username, or user type, or timestamp of last
login. To do the sort I used a Quicksort module that I had written 21
years ago. I wasn't surprised that it worked without changes. If it
worked back then, it should work now.

There is a belief in some quarters that if there have been no bug fixes
for a program in the last year or so, then it is "abandonware". That is
annoying to those of us who choose not to insert the bugs in the first
place.
--
Peter Moylan http://www.pmoylan.org
Newcastle, NSW, Australia
harry newton
2017-10-09 08:48:39 UTC
Permalink
Post by Peter Moylan
There's nothing wrong with using old software; it doesn't go rusty, or
anything like that. You don't even have to dust it.
The other day I had to add a feature to my FTP server where a list of
users could be sorted by username, or user type, or timestamp of last
login. To do the sort I used a Quicksort module that I had written 21
years ago. I wasn't surprised that it worked without changes. If it
worked back then, it should work now.
There is a belief in some quarters that if there have been no bug fixes
for a program in the last year or so, then it is "abandonware". That is
annoying to those of us who choose not to insert the bugs in the first
place.
I'm with you on the tried-and-true software being just fine.
The text editor I use most is based on ed which is as old as the hills.]
It still works - and it's still extremely efficient at editing pure text.

BTW, as an aside on the sorting, the fact that Windows and Linux both have
a "sort" command makes sorting of lines trivially easy, as in the vi-speak
example below:

:%!sort

Which means in vi-speak:
: => begin a command
% => run that command on the entire file
! => run a MS DOS shell command (i.e., from *outside* the editor)
sort => run the Windows (or Linux) "sort" command on what it found

I use this all the time to efficiently sort selected lines alphabetically.

The Microsoft "sort" command has a lot of options that I haven't used yet:
<https://technet.microsoft.com/en-us/library/bb491004.aspx>
Anton Shepelev
2017-10-08 20:41:45 UTC
Permalink
Harry Newton
Post by harry newton
How can we convert those dastardly curly quotes to
straight quotes on Windows?
<http://i67.tinypic.com/2h5mjbr.jpg>
I like to save into TEXT files on Windows techni-
cal information cut and pasted from disjoint news
articles where the unprintable curly quotes drive
me nuts!
You have my sympathy. The world has grown unicode-
crazy, but I will not forgo my 8-bit plain-text
files (not 7-bit, for I need English and Russian).
Since you are using Vim, the tool you need is al-
ready at your fingertips -- just set up a macro to
replace the offending unicode characters with their
ASCII representations. You might also want to rid
yourself of unicode apostrophes and em-dashes.

P.S.: Never do that for true typography and be chary
with any documents meant for printing.
--
() ascii ribbon campaign -- against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]
Jerry Friedman
2017-10-08 23:33:08 UTC
Permalink
Post by harry newton
How can we convert those dastardly curly quotes to straight quotes on Windows?
...

Since you posted to a.u.e., I thought I'd inform you of the original of
"dastard". It turned out to be different from what I expected, but here
it is anyway.

dastard (n.)
'mid-15c., "one who is lazy or dull;" an English formation on a
French model, probably from */dast/, "dazed," past participle of /dasen/
"to daze" (see /daze/ (v.)) + deprecatory suffix *-ard*. Meaning "one
who shirks from danger" is late 15c.'

http://www.etymonline.com/index.php?allowed_in_frame=0&search=dastard
--
Jerry Friedman
Rich Ulrich
2017-10-09 01:44:21 UTC
Permalink
On Sun, 8 Oct 2017 17:33:08 -0600, Jerry Friedman
Post by Jerry Friedman
Post by harry newton
How can we convert those dastardly curly quotes to straight quotes on Windows?
...
Since you posted to a.u.e., I thought I'd inform you of the original of
"dastard". It turned out to be different from what I expected, but here
it is anyway.
dastard (n.)
'mid-15c., "one who is lazy or dull;" an English formation on a
French model, probably from */dast/, "dazed," past participle of /dasen/
"to daze" (see /daze/ (v.)) + deprecatory suffix *-ard*. Meaning "one
who shirks from danger" is late 15c.'
http://www.etymonline.com/index.php?allowed_in_frame=0&search=dastard
Someone might tell the North Koreans that "dastard" would be a more
precisely-applied insult than dotard; dazed and lazy or dull.
--
Rich Ulrich
Loading...