Variable naming issue

Discussion:

Variable naming issue

(too old to reply)

maximus fl

2017-04-24 23:17:26 UTC

This is odd. gcc and clang before compile this program with no problems, and the prpgram runs just fine. I can't find a reason why.

#include <stdio.h>

int main(void) {

int A$ = 10;
int $B = 25;
int z = $B;
int c = 0;

$B = A$ + $B;
printf("A$ is: %i and $B is: %i and the sum is %i\n", A$, z, $B);
c = A$ +$B;
printf("c :=> %i\n", c);

return 0;
}

This works, but the c standard says this variable name should not be valid. Other symbols I have tried cause a compiler error, but this does not. Does anyone know why? Is there something special about the $. the $ is a special character in C, but I can't located anything about it. Does any one have any thoughts on this? Thanks in advance.

Stefan Ram

2017-04-24 23:23:24 UTC

Post by maximus fl
int A$ = 10;

6.4.2 Identifiers

6.4.2.1 General

Syntax

1

identifier:
identifier-nondigit
identifier identifier-nondigit
identifier digit

identifier-nondigit:
nondigit
universal-character-name
other implementation-defined characters
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Stefan Ram

2017-04-24 23:51:06 UTC

Post by Stefan Ram
other implementation-defined characters

For example, an ABI or a library generated by some program
or from a program in some other language might use identifiers
with a dollar sign »$«. When this is allowed in C, it allows
programs written in C to be linked with such libraries or
object files.

Keith Thompson

2017-04-25 00:26:38 UTC

Post by Stefan Ram

Post by maximus fl
int A$ = 10;

6.4.2 Identifiers
6.4.2.1 General
Syntax
1
identifier-nondigit
identifier identifier-nondigit
identifier digit
nondigit
universal-character-name
other implementation-defined characters
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Dollar-Signs.html

In GNU C, you may normally use dollar signs in identifier
names. This is because many traditional C implementations allow
such identifiers. However, dollar signs in identifiers are not
supported on a few target machines, typically because the target
assembler does not allow them.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

maximus fl

2017-04-25 01:18:56 UTC

Post by Keith Thompson

Post by Stefan Ram

Post by maximus fl
int A$ = 10;

6.4.2 Identifiers
6.4.2.1 General
Syntax
1
identifier-nondigit
identifier identifier-nondigit
identifier digit
nondigit
universal-character-name
other implementation-defined characters
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Dollar-Signs.html
In GNU C, you may normally use dollar signs in identifier
names. This is because many traditional C implementations allow
such identifiers. However, dollar signs in identifiers are not
supported on a few target machines, typically because the target
assembler does not allow them.
--
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Thanks! This answers my question. Thanks for the GNU C reference.

David Kleinecke

2017-04-24 23:28:34 UTC

Post by maximus fl
This is odd. gcc and clang before compile this program with no problems, and the prpgram runs just fine. I can't find a reason why.
#include <stdio.h>
int main(void) {
int A$ = 10;
int $B = 25;
int z = $B;
int c = 0;
$B = A$ + $B;
printf("A$ is: %i and $B is: %i and the sum is %i\n", A$, z, $B);
c = A$ +$B;
printf("c :=> %i\n", c);
return 0;
}
This works, but the c standard says this variable name should not be valid. Other symbols I have tried cause a compiler error, but this does not. Does anyone know why? Is there something special about the $. the $ is a special character in C, but I can't located anything about it. Does any one have any thoughts on this? Thanks in advance.

It wouldn't pass my compiler. "int $B = 25;" would generate the
token sequence
INT
DOLLAR
token named B
EQUAL
token whose address contains 25
SEMICOLON
then after seeing INT the compiler expects one of
another specifier
an identifier
OPEN
STAR
anything else causes a fatal error. (I wrote that from memory,
sorry about any errors.)

Other compilers may not go about things so directly.

bartc

2017-04-25 00:09:35 UTC

Post by maximus fl
This is odd. gcc and clang before compile this program with no problems, and the prpgram runs just fine. I can't find a reason why.
#include <stdio.h>
int main(void) {
int A$ = 10;
int $B = 25;
int z = $B;
int c = 0;
$B = A$ + $B;
printf("A$ is: %i and $B is: %i and the sum is %i\n", A$, z, $B);
c = A$ +$B;
printf("c :=> %i\n", c);
return 0;
}
This works, but the c standard says this variable name should not be valid. Other symbols I have tried cause a compiler error, but this does not. Does anyone know why? Is there something special about the $. the $ is a special character in C, but I can't located anything about it. Does any one have any thoughts on this? Thanks in advance.

Most compilers I've tried accept $ in names. Some don't (eg. Tiny C), so
it is necessary to avoid them in order to have code that compiles on
anything.

It's silly really because it can be literally a one-line change in a
compiler to allow $.

--
bartc

Noob

2017-04-25 07:37:52 UTC

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?

If the latter, did you write the assembler, or do you rely
on the system's assembler?

Maybe you shouldn't assume you're smarter than everyone else.
(Or maybe I should drop you back in my KF, most of your posts
just raise my blood pressure.)

bartc

2017-04-25 09:40:06 UTC

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?

Huh? We're talking about input source not output.

We're in the 21st century not back in the 70s where names were limited
to 6 upper case characters (because of 'radix-50' or 'sixbit' or whatever).

Post by Noob
If the latter, did you write the assembler, or do you rely
on the system's assembler?

Let me reiterate, in either a compiler /or/ assembler, support for $ can
be a one-line change from:

....
case 'y':
case 'z':

to:

....
case 'y':
case 'z':
case '$':

In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode), then it will
need to devise some escape mechanism. Not take it out on the user by
saying you can't do this because there might be the odd assembler out
there that can't handle it.

Post by Noob
Maybe you shouldn't assume you're smarter than everyone else.
(Or maybe I should drop you back in my KF, most of your posts
just raise my blood pressure.)

Some posts raise /mine/ notably ones from G Owen. But those are
sustained personal insults.

--
Bartc

David Brown

2017-04-25 11:07:15 UTC

Post by bartc

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?

Huh? We're talking about input source not output.
We're in the 21st century not back in the 70s where names were limited
to 6 upper case characters (because of 'radix-50' or 'sixbit' or whatever).

Post by Noob
If the latter, did you write the assembler, or do you rely
on the system's assembler?

Let me reiterate, in either a compiler /or/ assembler, support for $ can
....
....
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode),

What is it that you think is rare? That compilers work with existing
assemblers? Most compilers generate assembly outputs, and most make use
of existing assemblers.

Or do you think that it is rare for an assembler to use a dollar sign in
some special way that is incompatible (or at least inconvenient) with
using dollars in identifiers?

I know of assemblers that use $ to indicate local variables, assemblers
that use it for hex literals (rather than C's 0x), assemblers that use
it for register names, assemblers that use it for "current position" -
and assemblers that simply reject it as a letter.

Post by bartc
then it will
need to devise some escape mechanism. Not take it out on the user by
saying you can't do this because there might be the odd assembler out
there that can't handle it.

There are /many/ assemblers out there. Your experience is with a couple
of x86 tools and maybe a brief look at something for ARM - your
extrapolations of your narrow (albeit in-depth) experiences as though
they were general rules, makes you look silly.

Yes, an escape mechanism of some sort /could/ be used - but there are no
forms of reserved identifiers that could sensibly be used, so there is
nothing to stop conflict with other valid identifiers. You would have
to have a silly long escape sequence - and that could conflict with
length restrictions on assembly identifiers, and it would certainly look
strange to anyone examining listing files, linker map files, etc., or
using a debugger. And of course everything would fail if you tried to
link these symbols with code generated with other tools.

You will probably also find that using a $ in identifiers causes
challenges or incompatibilities with all sorts of other tools, such as
IDE's with syntax highlighting, tag generators, automatic code
documenters, etc.

And thus today's situation - that some compilers reject $ altogether,
and that others accept it if the backend assembler is happy, is the best
that can be done. This is for practical and sensible reasons - not
because compiler writers are lazy, stupid, stuck in the 70's, or did not
think of adding an extra "case" line to their software.

Post by bartc

Post by Noob
Maybe you shouldn't assume you're smarter than everyone else.
(Or maybe I should drop you back in my KF, most of your posts
just raise my blood pressure.)

Some posts raise /mine/ notably ones from G Owen. But those are
sustained personal insults.

bartc

2017-04-25 12:24:15 UTC

Post by David Brown

Post by bartc
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode),

What is it that you think is rare? That compilers work with existing
assemblers? Most compilers generate assembly outputs, and most make use
of existing assemblers.
Or do you think that it is rare for an assembler to use a dollar sign in
some special way that is incompatible (or at least inconvenient) with
using dollars in identifiers?
I know of assemblers that use $ to indicate local variables, assemblers
that use it for hex literals (rather than C's 0x), assemblers that use
it for register names, assemblers that use it for "current position" -
and assemblers that simply reject it as a letter.

So you're saying they're all different? In which case, the answer is to
cripple the language to work to the lowest common denominator? If one
assembler only works in upper case, the language should only allow upper
case too.

There are sometimes external influences, for example linkers limiting
the lengths of global names. But even there it is possible to around the
restrictions (as MSDOS did with allowing longer, free-format filenames
within the 8.3 filename format).

Anyway these are all very easy to fix. How long does it take to write an
assembler? It must be a lot easier than writing a compiler!

Post by David Brown
There are /many/ assemblers out there. Your experience is with a couple
of x86 tools and maybe a brief look at something for ARM - your
extrapolations of your narrow (albeit in-depth) experiences as though
they were general rules, makes you look silly.

In the 70s I used assemblers for 6800, some 16-bit machine, and for
PDP10. I just checked the manual for the latter's assembler ('macro10'),
and it allows dollars in symbol names.

In the 80s, I wrote assemblers for 8051, Z80 and 80186. In the 90s, I
wrote inline assemblers for 80386. For a while I wrote my own linkers.

Very narrow? Maybe. But we're talking about allowing a $ as part of
name; it's rocket science.

Post by David Brown
Yes, an escape mechanism of some sort /could/ be used - but there are no
forms of reserved identifiers that could sensibly be used, so there is
nothing to stop conflict with other valid identifiers. You would have
to have a silly long escape sequence - and that could conflict with
length restrictions on assembly identifiers, and it would certainly look
strange to anyone examining listing files, linker map files, etc., or
using a debugger. And of course everything would fail if you tried to
link these symbols with code generated with other tools.

Again, these should not be a consideration when designing a language (if
it's successful, the tools will follow), otherwise nothing would ever
progress.

(BTW have you checked whether those tools do in fact support $? The fact
that gcc does so, suggests that they do.)

Post by David Brown
And thus today's situation - that some compilers reject $ altogether,
and that others accept it if the backend assembler is happy, is the best
that can be done.

In that case why bother at all?

I'm in the annoying situation where I like to use $ rather than _ for
special 'decoration', but one or two out of the six C compilers I test
with don't support $. Even though they support the same targets.

The odd one out is Tiny C, which is too useful a compiler to decide not
to support. There is also lccwin, which supports $ but not at the start
of a name.

Post by David Brown
This is for practical and sensible reasons - not
because compiler writers are lazy, stupid, stuck in the 70's, or did not
think of adding an extra "case" line to their software.

In the case of Tiny C, I think everything is done internally. So what's
the excuse, saving a few bytes of code? It can't be that desperate to be
stay tiny!

--
bartc

bartc

2017-04-25 12:27:07 UTC

... we're talking about allowing a $ as part of
[a] name; it's rocket science.

And rocket science is child's play!

But you know what I meant ...

Ben Bacarisse

2017-04-25 12:59:32 UTC

Post by bartc

Post by David Brown

Post by bartc
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode),

What is it that you think is rare? That compilers work with existing
assemblers? Most compilers generate assembly outputs, and most make use
of existing assemblers.
Or do you think that it is rare for an assembler to use a dollar sign in
some special way that is incompatible (or at least inconvenient) with
using dollars in identifiers?
I know of assemblers that use $ to indicate local variables, assemblers
that use it for hex literals (rather than C's 0x), assemblers that use
it for register names, assemblers that use it for "current position" -
and assemblers that simply reject it as a letter.

So you're saying they're all different? In which case, the answer is
to cripple the language to work to the lowest common denominator?

Can I confirm that this is an accurate statement of your opinion -- that
a language (maybe only C) is literally *crippled* because you can't rely
on using $ in an identifier? I ask because I think you sometimes just
say stuff to get a rise from people.

<snip>

Post by bartc
In the case of Tiny C, I think everything is done internally. So
what's the excuse, saving a few bytes of code? It can't be that
desperate to be stay tiny!

Excuse? Is that also a genuine opinion -- any C compiler that does not
permit $ in an identifier should have an *excuse* for not doing so?

Whatever their "excuse" it's not code size since tcc /does/ use $, it's
juts they decided to use it differently.

--
Ben.

bartc

2017-04-25 13:20:35 UTC

Post by Ben Bacarisse

Post by bartc
So you're saying they're all different? In which case, the answer is
to cripple the language to work to the lowest common denominator?

Can I confirm that this is an accurate statement of your opinion -- that
a language (maybe only C) is literally *crippled* because you can't rely
on using $ in an identifier? I ask because I think you sometimes just
say stuff to get a rise from people.

No not literally. And not specifically that minor issue.

C would look very different though if $ had been used for system names
rather than the hard-to-see _. (Both still ugly, but at least it's
easier to spot $.)

It is rather strange: $ is one of the 96 printable ASCII characters,
it's even one of the 64 you got on a teletype, yet it hardly figures in
the language.

Post by Ben Bacarisse
<snip>

Post by bartc
In the case of Tiny C, I think everything is done internally. So
what's the excuse, saving a few bytes of code? It can't be that
desperate to be stay tiny!

Excuse? Is that also a genuine opinion -- any C compiler that does not
permit $ in an identifier should have an *excuse* for not doing so?

Yes, when most others do support it, then you might well ask why not.
Especially if you have code that's been using $, and it doesn't compile
because of it.

Post by Ben Bacarisse
Whatever their "excuse" it's not code size since tcc /does/ use $, it's
juts they decided to use it differently.

In what way? I've seen it in the source code, but for something called
an 'ldname'.

--
bartc

Ben Bacarisse

2017-04-25 13:27:36 UTC

<snip>

Post by bartc

Post by Ben Bacarisse
Whatever their "excuse" it's not code size since tcc /does/ use $, it's
[just] they decided to use it differently.

In what way? I've seen it in the source code, but for something called
an 'ldname'.

I don't know, but it is pre-processed as a single-character token so
using it in names would require a change rather than an addition.

--
Ben.

Keith Thompson

2017-04-25 18:52:58 UTC

Post by Ben Bacarisse
<snip>

Post by bartc

Post by Ben Bacarisse
Whatever their "excuse" it's not code size since tcc /does/ use $, it's
[just] they decided to use it differently.

In what way? I've seen it in the source code, but for something called
an 'ldname'.

I don't know, but it is pre-processed as a single-character token so
using it in names would require a change rather than an addition.

The $ character isn't part of the basic source or execution
character set, so it needn't be pre-processed at all. A conforming
C implementation could easily exist on a system that doesn't support
the $ character at all. Both ASCII and EBCDIC include it, so that's
not likely in practice -- but in the past there might have been
ASCII-based character sets that substitute some other character
for $ (I'm not sufficiently interested to verify that).

Sure, it might be a one-line compiler change to permit $ in
identifiers. Actually more than that since it would need to
be diagnosed in conforming mode, unless you want to change the
language to allow it unconditionally -- and even then real-world C
compilers are going to have modes to enforce conformance to earlier
C standards. And if C identifiers are mapped directly to assembly
language identifiers and/or linker symbols, there will have to be
additional code to generate valid identifiers if the assembler or
linker doesn't like $, or doesn't like it exactly the same way the
C compiler does (can it start an identifier?).

A similar level of work could allow @ and ` in identifiers. But then
allowing ` would interfere with Markdown, which is commonly used
to quote snippets of C source code, and @ might cause other problems.

Even ignoring all that, the amount of work required to implement
a feature is only one of a number of considerations that should
affect the decision of whether to support that feature.

I've never thought that C's problem is that it doesn't use enough
punctuation characters.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Malcolm McLean

2017-04-25 20:40:39 UTC

Post by Keith Thompson
I've never thought that C's problem is that it doesn't use enough
punctuation characters.

Basic uses $ to tag string variables, and some versions used & for
integers. Perl uses $ for scalars and @ for arrays, but it's notorious
for being hard to read.

bartc

2017-04-25 21:14:58 UTC

Post by Keith Thompson
Even ignoring all that, the amount of work required to implement
a feature is only one of a number of considerations that should
affect the decision of whether to support that feature.

Yet, it's supported by DMC. By gcc. By MSVC. By Pelles C. By Clang. By
nearly all the C compilers on Godbolt.

With a feature already so widespread, it becomes problematic /not/
having it.

Post by Keith Thompson
I've never thought that C's problem is that it doesn't use enough
punctuation characters.

It wouldn't be a punctuation character. "$$$" would be a single
identifier just like "___". Not three successive symbols.

--
bartc

David Brown

2017-04-25 22:16:02 UTC

Post by bartc

Post by Keith Thompson
Even ignoring all that, the amount of work required to implement
a feature is only one of a number of considerations that should
affect the decision of whether to support that feature.

Yet, it's supported by DMC. By gcc. By MSVC. By Pelles C. By Clang. By
nearly all the C compilers on Godbolt.
With a feature already so widespread, it becomes problematic /not/
having it.

It would only be a problem if the /use/ of $ in identifiers in C was
widespread, rather than just the compiler support for it. Further, the
$ would need to be used in code that is not bound to particular
compilers in other ways (such as requiring the modern C11 standard, or
support for multiple target cpus, or other compiler extensions).

Post by bartc

Post by Keith Thompson
I've never thought that C's problem is that it doesn't use enough
punctuation characters.

C++ could perhaps do with support for more symbols/punctuation for new
operators, but C has enough IMHO.

Post by bartc
It wouldn't be a punctuation character. "$$$" would be a single
identifier just like "___". Not three successive symbols.

That's true, logically. But visually, the $ reads like punctuation or a
symbol, not unlike _ (except more disruptive to read).

bartc

2017-04-25 23:13:38 UTC

Post by David Brown

Post by bartc

Post by Keith Thompson
Even ignoring all that, the amount of work required to implement
a feature is only one of a number of considerations that should
affect the decision of whether to support that feature.

Yet, it's supported by DMC. By gcc. By MSVC. By Pelles C. By Clang. By
nearly all the C compilers on Godbolt.
With a feature already so widespread, it becomes problematic /not/
having it.

It would only be a problem if the /use/ of $ in identifiers in C was
widespread, rather than just the compiler support for it. Further, the
$ would need to be used in code that is not bound to particular
compilers in other ways (such as requiring the modern C11 standard, or
support for multiple target cpus, or other compiler extensions).

It's Catch-22: I don't use $ in any of my C code that I want anyone else
to compile. /Because/ support is not guaranteed.

(Which is a bit of a nuisance. If I need a series of temps, they end up
being called _1, _2, _3, instead of $1, $2, $3.)

David Brown

2017-04-26 08:10:56 UTC

Post by bartc

Post by David Brown

Post by bartc

Post by Keith Thompson
Even ignoring all that, the amount of work required to implement
a feature is only one of a number of considerations that should
affect the decision of whether to support that feature.

Yet, it's supported by DMC. By gcc. By MSVC. By Pelles C. By Clang. By
nearly all the C compilers on Godbolt.
With a feature already so widespread, it becomes problematic /not/
having it.

It would only be a problem if the /use/ of $ in identifiers in C was
widespread, rather than just the compiler support for it. Further, the
$ would need to be used in code that is not bound to particular
compilers in other ways (such as requiring the modern C11 standard, or
support for multiple target cpus, or other compiler extensions).

It's Catch-22: I don't use $ in any of my C code that I want anyone else
to compile. /Because/ support is not guaranteed.

In the C world, there are standards to say what is guaranteed for a wide
range of tools. There are common extensions that are available on many
tools - the C standards even mentions some, including the use of $ in
identifiers (J.5.2 if you want to look it up). And of course not all
tools support the standards as well as might be wished. But the
standards are the nearest there is to giving the guaranteed behaviour
for a C compiler - and they don't include $ in identifiers.

Post by bartc
(Which is a bit of a nuisance. If I need a series of temps, they end up
being called _1, _2, _3, instead of $1, $2, $3.)

I can't see any problem with that. You are talking about temporary
identifiers generated in intermediary code by a translator, which will
normally only be seen by the backend C compiler. Pretty much the only
person who will ever see these identifiers will be /you/, as the creator
and debugger of the translator program. _1, _2, _3 are fine names - as
are tXwZ_1, tXwZ_2, etc., and anything else unlikely to clash with a
user symbol. (It is absolutely fine for you, in specifying your
language, to say that any identifiers starting with tXwZ are reserved
for the implemnentation.)

If you think the use of $ is better than alternatives, and it works with
the backends you want to support, then use $. If you want to use
backends that don't work with $, then don't use it. It is that simple.

s***@casperkitty.com

2017-04-25 21:33:23 UTC

Post by Keith Thompson
The $ character isn't part of the basic source or execution
character set, so it needn't be pre-processed at all. A conforming
C implementation could easily exist on a system that doesn't support
the $ character at all. Both ASCII and EBCDIC include it, so that's
not likely in practice -- but in the past there might have been
ASCII-based character sets that substitute some other character
for $ (I'm not sufficiently interested to verify that).

Many machines don't have all characters of the C source character set,
but a simple way to handle that without having to get into wackiness
like trigraphs would be to require every implementation to identify, for
each member of the source character set, at least one character or
sequence thereof which may be used to represent it. If an implementation's
source character set would depict code 0x5F as an back-arrow but some other
character code would appear as a line at the bottom of the character cell,
an implementation might accept either or both as being equivalent to an
underscore, provided that it documents such behavior.

Supplement that rule with one that allows implementations to define
additional "identifier" characters which may be used in user or
implementation-defined identifiers (but not in any Standard-defined ones)
and that should pretty well take care of things. If one system allows
external names with @ signs, and if code might need to define or use such
identifiers to interact with other parts of the system, an implementation
for that system should allow @ within names. If another system would not
allow such names, an implementation for that system should not allow the
character to appear in external labels, and most applications would have
relatively little need to use it for internal symbols either.

Post by Keith Thompson
Sure, it might be a one-line compiler change to permit $ in
identifiers. Actually more than that since it would need to
be diagnosed in conforming mode, unless you want to change the
language to allow it unconditionally -- and even then real-world C
compilers are going to have modes to enforce conformance to earlier
C standards. And if C identifiers are mapped directly to assembly
language identifiers and/or linker symbols, there will have to be
additional code to generate valid identifiers if the assembler or
linker doesn't like $, or doesn't like it exactly the same way the
C compiler does (can it start an identifier?).

The primary purpose for using a $ within an exported variable name would
be to expose a name containing that character to outside code in cases
where it would be useful to do so. Since the usefulness of such names
would likely depend upon factors outside the compiler's control, there's
no reason the compiler should need to worry about the particulars of
whether particular names are apt to be useful. If a linker is going to
choke on a given name, that can be left to the linker rather than the
compiler.

Post by Keith Thompson
allowing ` would interfere with Markdown, which is commonly used

Simply recognize that an implementation may define as an extension
additional characters would--in the author's judgment--make it most
useful for its intended purpose, and recognize that code which uses such
characters will be incompatible with implementations that don't support
them.

Malcolm McLean

2017-04-25 13:39:06 UTC

Post by bartc
O
C would look very different though if $ had been used for system names
rather than the hard-to-see _. (Both still ugly, but at least it's
easier to spot $.)
It is rather strange: $ is one of the 96 printable ASCII characters,
it's even one of the 64 you got on a teletype, yet it hardly figures in
the language.

This was the 1970s. The dollar sign was still intimately tied to the currency
and didn't appear on many non-American keyboards.

Robert Wessel

2017-04-25 14:35:22 UTC

Post by bartc

Post by Ben Bacarisse

Post by bartc
So you're saying they're all different? In which case, the answer is
to cripple the language to work to the lowest common denominator?

Can I confirm that this is an accurate statement of your opinion -- that
a language (maybe only C) is literally *crippled* because you can't rely
on using $ in an identifier? I ask because I think you sometimes just
say stuff to get a rise from people.

No not literally. And not specifically that minor issue.
C would look very different though if $ had been used for system names
rather than the hard-to-see _. (Both still ugly, but at least it's
easier to spot $.)

VMS did that. The names with the dollar signs are *much* uglier. At
least, IMO.

Gareth Owen

2017-04-25 17:39:53 UTC

Post by Ben Bacarisse

Post by bartc
So you're saying they're all different? In which case, the answer is
to cripple the language to work to the lowest common denominator?

Can I confirm that this is an accurate statement of your opinion -- that
a language (maybe only C) is literally *crippled* because you can't rely
on using $ in an identifier? I ask because I think you sometimes just
say stuff to get a rise from people.

Trolls gotta troll.

bartc

2017-04-25 17:49:53 UTC

Post by Gareth Owen

Post by Ben Bacarisse

Post by bartc
So you're saying they're all different? In which case, the answer is
to cripple the language to work to the lowest common denominator?

Can I confirm that this is an accurate statement of your opinion -- that
a language (maybe only C) is literally *crippled* because you can't rely
on using $ in an identifier? I ask because I think you sometimes just
say stuff to get a rise from people.

Trolls gotta troll.

****

David Kleinecke

2017-04-25 18:19:40 UTC

Post by Gareth Owen

Post by Ben Bacarisse

Post by bartc
So you're saying they're all different? In which case, the answer is
to cripple the language to work to the lowest common denominator?

Can I confirm that this is an accurate statement of your opinion -- that
a language (maybe only C) is literally *crippled* because you can't rely
on using $ in an identifier? I ask because I think you sometimes just
say stuff to get a rise from people.

Trolls gotta troll.

****

The standard says the compiler tokenizes in translation phase 3 of
the pre-processor. After that it doesn't matter what the source
presentation looked like. Phases 1 and 2 are relatively trivial so
it would be fair to say that a compiler looses all dependence on the
appearance of the source immediately.

The standard also describes what might be called one possible source
shape discipline. For any other isomorphic source (there is a one-to-
one mapping of external symbols) all one needs to do is to add a pre-
pre-processor that changes one's favorite source shape into the
standard's shape.

For example one might write " AND "instead of "&&" and the pre-pre-
processor would change that to "&&".

But I would prefer to roll phases 1 and 2 into 3 and simply modify the
tokenizer to accept my set of source "tokens".

bartc

2017-04-25 18:31:12 UTC

Post by David Kleinecke
The standard says the compiler tokenizes in translation phase 3 of
the pre-processor. After that it doesn't matter what the source
presentation looked like. Phases 1 and 2 are relatively trivial so
it would be fair to say that a compiler looses all dependence on the
appearance of the source immediately.
The standard also describes what might be called one possible source
shape discipline. For any other isomorphic source (there is a one-to-
one mapping of external symbols) all one needs to do is to add a pre-
pre-processor that changes one's favorite source shape into the
standard's shape.
For example one might write " AND "instead of "&&" and the pre-pre-
processor would change that to "&&".

Yes, you can create a different syntax for the language /but you are
still writing C/.

Post by David Kleinecke
But I would prefer to roll phases 1 and 2 into 3 and simply modify the
tokenizer to accept my set of source "tokens".

But there is a problem: suppose A writes a program using this new
'source shape', and wants to share it with B. What does A have to give B?

If A was writing plain C, then just C source suffices. But now A has to
provide extra tools to do this extra pre-processing. If B wants to read
and maintain the program, then there is also the question of B having to
understand this new syntax.

(But maybe, when source already seems to rely on so much scripting
anyway, that is outside of the language, this might not be that big a deal.)

--
bartc

Ben Bacarisse

2017-04-25 22:36:48 UTC

David Kleinecke <***@gmail.com> writes:
<snip>

Post by David Kleinecke
The standard says the compiler tokenizes in translation phase 3 of
the pre-processor. After that it doesn't matter what the source
presentation looked like.

That not absolutely correct. In translation phase 4 the source
presentation can become relevant again due to the # macro operator.

<snip>

Post by David Kleinecke
For example one might write " AND "instead of "&&" and the pre-pre-
processor would change that to "&&".
But I would prefer to roll phases 1 and 2 into 3 and simply modify the
tokenizer to accept my set of source "tokens".

Since this is your design, you can do what you like with something like

#define s(t) #t
s(AND)

but converting AND to && is out of step with existing practice. Where C
currently has alternate spellings for tokens, these much be recovered
when a token is "stringized".

--
Ben.

David Kleinecke

2017-04-26 00:04:21 UTC

Post by Ben Bacarisse
<snip>

Post by David Kleinecke
The standard says the compiler tokenizes in translation phase 3 of
the pre-processor. After that it doesn't matter what the source
presentation looked like.

That not absolutely correct. In translation phase 4 the source
presentation can become relevant again due to the # macro operator.
<snip>

Post by David Kleinecke
For example one might write " AND "instead of "&&" and the pre-pre-
processor would change that to "&&".
But I would prefer to roll phases 1 and 2 into 3 and simply modify the
tokenizer to accept my set of source "tokens".

Since this is your design, you can do what you like with something like
#define s(t) #t
s(AND)
but converting AND to && is out of step with existing practice. Where C
currently has alternate spellings for tokens, these much be recovered
when a token is "stringized".

Of course, using AND for && would be perverse - but you know
people.

You are right about phase 4. A macro can create a new token and
that must be passed through phases 1, 2 and 3. This is pain in the
neck but not actually very difficult.

Robert Wessel

2017-04-26 05:32:53 UTC

On Tue, 25 Apr 2017 17:04:21 -0700 (PDT), David Kleinecke

Post by David Kleinecke

Post by Ben Bacarisse
<snip>

Post by David Kleinecke
The standard says the compiler tokenizes in translation phase 3 of
the pre-processor. After that it doesn't matter what the source
presentation looked like.

That not absolutely correct. In translation phase 4 the source
presentation can become relevant again due to the # macro operator.
<snip>

Post by David Kleinecke
For example one might write " AND "instead of "&&" and the pre-pre-
processor would change that to "&&".
But I would prefer to roll phases 1 and 2 into 3 and simply modify the
tokenizer to accept my set of source "tokens".

Since this is your design, you can do what you like with something like
#define s(t) #t
s(AND)
but converting AND to && is out of step with existing practice. Where C
currently has alternate spellings for tokens, these much be recovered
when a token is "stringized".

Of course, using AND for && would be perverse - but you know
people.

OTOH, using "and" for && only requires including iso646.h. The exact
level of perversity that entails I'll leave for others to judge.

David Brown

2017-04-25 14:02:40 UTC

Post by bartc

Post by David Brown

Post by bartc
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode),

What is it that you think is rare? That compilers work with existing
assemblers? Most compilers generate assembly outputs, and most make use
of existing assemblers.
Or do you think that it is rare for an assembler to use a dollar sign in
some special way that is incompatible (or at least inconvenient) with
using dollars in identifiers?
I know of assemblers that use $ to indicate local variables, assemblers
that use it for hex literals (rather than C's 0x), assemblers that use
it for register names, assemblers that use it for "current position" -
and assemblers that simply reject it as a letter.

So you're saying they're all different? In which case, the answer is to
cripple the language to work to the lowest common denominator? If one
assembler only works in upper case, the language should only allow upper
case too.

I am saying that there are lots of assemblers, many of which use
characters such as $ for a variety of purposes, making it hard to use
them in identifiers as "letters".

Standard C lets you use a-z, A-Z and _ as "letters", along with the
digits 0-9, in identifiers. Any assembler that can be used as a backend
for a C compiler will support these (in a case-sensitive manner).

A compiler /may/ support additional "letters" such as $. But this will
only work if the compiler supports it, /and/ the assembler supports it -
this support is implementation dependent.

How can this be so difficult to understand?

Post by bartc
There are sometimes external influences, for example linkers limiting
the lengths of global names. But even there it is possible to around the
restrictions (as MSDOS did with allowing longer, free-format filenames
within the 8.3 filename format).

You might choose to do something like that - it is certainly possible.
But for a variety of reasons, people (compiler writers and compiler
users) prefer that there is a very close match between compiler
identifiers and assembly identifiers - it makes it far easier to read
the generated assembly and other listing and map files, and makes it
easier to link the file with other files. (For example, identifiers are
preceded by "_" to avoid conflicts with register names.) You can read
about the unpleasantness of C++ mangled function names to see the
disadvantages of such mangling. (For C++, it was an unavoidable cost.)
I cannot imagine anyone wanting to go through this simply to allow $
symbols in identifier names.

Post by bartc
Anyway these are all very easy to fix. How long does it take to write an
assembler? It must be a lot easier than writing a compiler!

I would imagine that writing an assembler is a good deal easier than
writing a compiler. It is still much more effort than /not/ writing an
assembler and simply using existing ones. Would you write a new
assembler just to allow $ symbols in C identifiers?

Post by bartc

Post by David Brown
There are /many/ assemblers out there. Your experience is with a couple
of x86 tools and maybe a brief look at something for ARM - your
extrapolations of your narrow (albeit in-depth) experiences as though
they were general rules, makes you look silly.

In the 70s I used assemblers for 6800, some 16-bit machine, and for
PDP10. I just checked the manual for the latter's assembler ('macro10'),
and it allows dollars in symbol names.
In the 80s, I wrote assemblers for 8051, Z80 and 80186. In the 90s, I
wrote inline assemblers for 80386. For a while I wrote my own linkers.
Very narrow? Maybe. But we're talking about allowing a $ as part of
name; it's rocket science.

That's fine - for those assembly languages that traditionally do not use
$ for other purposes.

Post by bartc

Post by David Brown
Yes, an escape mechanism of some sort /could/ be used - but there are no
forms of reserved identifiers that could sensibly be used, so there is
nothing to stop conflict with other valid identifiers. You would have
to have a silly long escape sequence - and that could conflict with
length restrictions on assembly identifiers, and it would certainly look
strange to anyone examining listing files, linker map files, etc., or
using a debugger. And of course everything would fail if you tried to
link these symbols with code generated with other tools.

Again, these should not be a consideration when designing a language (if
it's successful, the tools will follow), otherwise nothing would ever
progress.
(BTW have you checked whether those tools do in fact support $? The fact
that gcc does so, suggests that they do.)

No, I have not bothered checking such tools - because $ is not something
that I or many other people use in identifiers, even when it is
supported by a compiler and assembler.

Post by bartc

Post by David Brown
And thus today's situation - that some compilers reject $ altogether,
and that others accept it if the backend assembler is happy, is the best
that can be done.

In that case why bother at all?

That's a very relevant question. I can't see many good uses of $ in
identifier names. They /might/ be of interest for some libraries as
"internal" symbols, simply because they have a very low risk of crashing
with any normal user symbols because people rarely use $.

Post by bartc
I'm in the annoying situation where I like to use $ rather than _ for
special 'decoration', but one or two out of the six C compilers I test
with don't support $. Even though they support the same targets.

I can see that as being a possible good use of $. You either have to
limit the choice of backend C compilers you support, or you find an
alternative decoration.

Post by bartc
The odd one out is Tiny C, which is too useful a compiler to decide not
to support. There is also lccwin, which supports $ but not at the start
of a name.

Post by David Brown
This is for practical and sensible reasons - not
because compiler writers are lazy, stupid, stuck in the 70's, or did not
think of adding an extra "case" line to their software.

In the case of Tiny C, I think everything is done internally. So what's
the excuse, saving a few bytes of code? It can't be that desperate to be
stay tiny!

Tim Rentsch

2017-05-02 12:25:04 UTC

Post by David Brown

Post by bartc
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode),

What is it that you think is rare? That compilers work with existing
assemblers? Most compilers generate assembly outputs, and most make use
of existing assemblers.
Or do you think that it is rare for an assembler to use a dollar sign in
some special way that is incompatible (or at least inconvenient) with
using dollars in identifiers?
I know of assemblers that use $ to indicate local variables, assemblers
that use it for hex literals (rather than C's 0x), assemblers that use
it for register names, assemblers that use it for "current position" -
and assemblers that simply reject it as a letter.

So you're saying they're all different? In which case, the answer is
to cripple the language to work to the lowest common denominator?

The principal point of having a standardized language is to be
able to write programs that will run in all target environments.
That essentially requires making use of only those elements that
are in the common subset (ie, the largest common subset, not the
smallest common subset - the phrase "lowest common denominator"
is a misnomer).

s***@casperkitty.com

2017-05-02 17:04:20 UTC

Post by Tim Rentsch
The principal point of having a standardized language is to be
able to write programs that will run in all target environments.

That will run in all target environments *of interest*.

Post by Tim Rentsch
That essentially requires making use of only those elements that
are in the common subset (ie, the largest common subset, not the
smallest common subset - the phrase "lowest common denominator"
is a misnomer).

The phrase "to cater to the lowest common denominator" is a common idiom
in American English; the phrase "greatest common factor" would probably be
more appropriate mathematically, but doesn't match the idiomatic usage.

It makes sense to limit code to those elements that will be common to all
platforms of interest. Is there any reason code should need to avoid using
features that would be unsupportable on 1% of platforms, if those platforms
would be incapable of running the code for other reasons anyway?

If implementation writers can be expected to support features and guarantees
beyond those mandated by the Standard in cases where it makes sense to do so,
then there would be no need for the Standard to mandate all useful behaviors.
The failure of the Standard to mandate behavioral guarantees which would make
sense on the vast majority of platforms is consistent with such expectation,
and is not consistent with an expectation that compiler writers will interpret
permission to do anything as a judgment that anything they might do should be
viewed as sensible merely because it is permitted.

Tim Rentsch

2017-05-05 10:06:31 UTC

Post by s***@casperkitty.com

Post by Tim Rentsch
The principal point of having a standardized language is to be
able to write programs that will run in all target environments.

That will run in all target environments *of interest*.

That's already included in the "target environment" phrase. What
makes an environment be a target is that it is of interest.

Post by s***@casperkitty.com

Post by Tim Rentsch
That essentially requires making use of only those elements that
are in the common subset (ie, the largest common subset, not the
smallest common subset - the phrase "lowest common denominator"
is a misnomer).

The phrase "to cater to the lowest common denominator" is a common idiom
in American English; the phrase "greatest common factor" would probably be
more appropriate mathematically, but doesn't match the idiomatic usage.

You really just enjoy being an annoying prick don't you.

s***@casperkitty.com

2017-04-25 14:12:20 UTC

Post by David Brown
Yes, an escape mechanism of some sort /could/ be used - but there are no
forms of reserved identifiers that could sensibly be used, so there is
nothing to stop conflict with other valid identifiers. You would have
to have a silly long escape sequence - and that could conflict with
length restrictions on assembly identifiers, and it would certainly look
strange to anyone examining listing files, linker map files, etc., or
using a debugger. And of course everything would fail if you tried to
link these symbols with code generated with other tools.

Nothing would preclude a compiler from allowing the use of dollar signs in
macros or in symbols that are neither imported nor exported, no matter what
an assembler chose to do. As for imported/exported names, I think C would
have benefited from a qualifier or directive which would set the import/export
name for any symbol using a string literal, thus making it possible to e.g.
call an outside function named "restrict", and from a directive allowing any
character or characters which have no other meaning to be included within
the set that C will regard as identifier characters. Even if the latter
ability didn't apply to exported names, code could still say, e.g.

// Assuming a directive has indicated that $ should be allowed in macros
#define $foo DOLLAR_FOO

and then use $foo everywhere else.

Robert Wessel

2017-04-25 14:48:05 UTC

On Tue, 25 Apr 2017 13:07:15 +0200, David Brown

Post by David Brown

Post by bartc

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?

Huh? We're talking about input source not output.
We're in the 21st century not back in the 70s where names were limited
to 6 upper case characters (because of 'radix-50' or 'sixbit' or whatever).

Post by Noob
If the latter, did you write the assembler, or do you rely
on the system's assembler?

Let me reiterate, in either a compiler /or/ assembler, support for $ can
....
....
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode),

What is it that you think is rare? That compilers work with existing
assemblers? Most compilers generate assembly outputs, and most make use
of existing assemblers.
Or do you think that it is rare for an assembler to use a dollar sign in
some special way that is incompatible (or at least inconvenient) with
using dollars in identifiers?
I know of assemblers that use $ to indicate local variables, assemblers
that use it for hex literals (rather than C's 0x), assemblers that use
it for register names, assemblers that use it for "current position" -
and assemblers that simply reject it as a letter.

Post by bartc
then it will
need to devise some escape mechanism. Not take it out on the user by
saying you can't do this because there might be the odd assembler out
there that can't handle it.

There are /many/ assemblers out there. Your experience is with a couple
of x86 tools and maybe a brief look at something for ARM - your
extrapolations of your narrow (albeit in-depth) experiences as though
they were general rules, makes you look silly.
Yes, an escape mechanism of some sort /could/ be used - but there are no
forms of reserved identifiers that could sensibly be used, so there is
nothing to stop conflict with other valid identifiers. You would have
to have a silly long escape sequence - and that could conflict with
length restrictions on assembly identifiers, and it would certainly look
strange to anyone examining listing files, linker map files, etc., or
using a debugger. And of course everything would fail if you tried to
link these symbols with code generated with other tools.
You will probably also find that using a $ in identifiers causes
challenges or incompatibilities with all sorts of other tools, such as
IDE's with syntax highlighting, tag generators, automatic code
documenters, etc.
And thus today's situation - that some compilers reject $ altogether,
and that others accept it if the backend assembler is happy, is the best
that can be done. This is for practical and sensible reasons - not
because compiler writers are lazy, stupid, stuck in the 70's, or did not
think of adding an extra "case" line to their software.

While I rarely agree with bartc, I have zero sympathy with the notion
that the limitations of some unrelated bit of software used internally
in a compiler implementation should dictate limitations on the
language implemented.

That being said, it's clear that some accommodations have been made
(the limits on required significance on external names in C89, for
example). OTOH, there is nothing in the current scheme that makes
name conflicts impossible either.

Robert Wessel

2017-04-25 14:52:51 UTC

On Tue, 25 Apr 2017 09:48:05 -0500, Robert Wessel

Post by Robert Wessel
On Tue, 25 Apr 2017 13:07:15 +0200, David Brown

Post by David Brown

Post by bartc

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?

Huh? We're talking about input source not output.
We're in the 21st century not back in the 70s where names were limited
to 6 upper case characters (because of 'radix-50' or 'sixbit' or whatever).

Post by Noob
If the latter, did you write the assembler, or do you rely
on the system's assembler?

Let me reiterate, in either a compiler /or/ assembler, support for $ can
....
....
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode),

What is it that you think is rare? That compilers work with existing
assemblers? Most compilers generate assembly outputs, and most make use
of existing assemblers.
Or do you think that it is rare for an assembler to use a dollar sign in
some special way that is incompatible (or at least inconvenient) with
using dollars in identifiers?
I know of assemblers that use $ to indicate local variables, assemblers
that use it for hex literals (rather than C's 0x), assemblers that use
it for register names, assemblers that use it for "current position" -
and assemblers that simply reject it as a letter.

Post by bartc
then it will
need to devise some escape mechanism. Not take it out on the user by
saying you can't do this because there might be the odd assembler out
there that can't handle it.

There are /many/ assemblers out there. Your experience is with a couple
of x86 tools and maybe a brief look at something for ARM - your
extrapolations of your narrow (albeit in-depth) experiences as though
they were general rules, makes you look silly.
Yes, an escape mechanism of some sort /could/ be used - but there are no
forms of reserved identifiers that could sensibly be used, so there is
nothing to stop conflict with other valid identifiers. You would have
to have a silly long escape sequence - and that could conflict with
length restrictions on assembly identifiers, and it would certainly look
strange to anyone examining listing files, linker map files, etc., or
using a debugger. And of course everything would fail if you tried to
link these symbols with code generated with other tools.
You will probably also find that using a $ in identifiers causes
challenges or incompatibilities with all sorts of other tools, such as
IDE's with syntax highlighting, tag generators, automatic code
documenters, etc.
And thus today's situation - that some compilers reject $ altogether,
and that others accept it if the backend assembler is happy, is the best
that can be done. This is for practical and sensible reasons - not
because compiler writers are lazy, stupid, stuck in the 70's, or did not
think of adding an extra "case" line to their software.

While I rarely agree with bartc, I have zero sympathy with the notion
that the limitations of some unrelated bit of software used internally
in a compiler implementation should dictate limitations on the
language implemented.
That being said, it's clear that some accommodations have been made
(the limits on required significance on external names in C89, for
example). OTOH, there is nothing in the current scheme that makes
name conflicts impossible either.

Which is not to say that I find the absence of dollar signs in
identifiers to be much of an issue.

Noob

2017-04-26 06:46:27 UTC

Post by bartc

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?

Huh? We're talking about input source not output.

OK, Mr Smarty Pants. How do you deal with identifiers with
external linkage? (C99 6.2.2)

Post by bartc
We're in the 21st century, not back in the 70s where names were limited
to 6 upper case characters (because of 'radix-50' or 'sixbit' or whatever).

New features have cost-benefit trade-offs.
It's impossible to perform a useful analysis when one's
experience is limited to Windows/x86.

Post by bartc

Post by Noob
If the latter, did you write the assembler, or do you rely
on the system's assembler?

Let me reiterate, in either a compiler /or/ assembler, support for $ can

Do you believe I didn't understand the first time?

Post by bartc
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode), then it will
need to devise some escape mechanism. Not take it out on the user by
saying you can't do this because there might be the odd assembler out
there that can't handle it.

I checked for gas, the assembler overwhelmingly used on
Linux and *BSD systems.

https://sourceware.org/binutils/docs/as/Symbol-Intro.html
https://sourceware.org/binutils/docs/as/Symbol-Names.html

Symbol names begin with a letter or with one of `._'. On most
machines, you can also use $ in symbol names; exceptions are noted in
Machine Dependencies. That character may be followed by any string of
digits, letters, dollar signs (unless otherwise noted for a
particular target machine), and underscores.
Are you suggesting gcc, clang, icc, and every other compiler ever
made should implement the same "escape mechanism"? Then this would
have to be part of some standard, presumably ISO/IEC 9899?

EOT

bartc

2017-04-26 10:46:21 UTC

Post by bartc

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?

Huh? We're talking about input source not output.

OK, Mr Smarty Pants. How do you deal with identifiers with
external linkage? (C99 6.2.2)

Post by bartc
We're in the 21st century, not back in the 70s where names were limited
to 6 upper case characters (because of 'radix-50' or 'sixbit' or whatever).

New features have cost-benefit trade-offs.
It's impossible to perform a useful analysis when one's
experience is limited to Windows/x86.

What's your vast experience then?

I wrote my first compiler (of sorts) on a DECSystem-10. In assembly
code. And the assembler supported "$" in names (whether or not I made
use of that feature, I can't remember). And the first interpreter (of
sorts) I played with was on a PDP11. No Windows or x86 in sight.

Post by Noob
Are you suggesting gcc, clang, icc,

Um, gcc, clang and icc appear to support "$" in identifiers.

In fact most C compilers seem to do so. So I'm not sure what we're
arguing about.

The problem for me is that one or two that I use don't support it. And
they both use their own assemblers and linkers so compatibility with
other tools isn't the reason.

--
bartc

bartc

2017-04-26 11:00:50 UTC

Post by bartc
We're in the 21st century, not back in the 70s where names were limited
to 6 upper case characters (because of 'radix-50' or 'sixbit' or whatever).

New features have cost-benefit trade-offs.

BTW 'sixbit', which only allows 64 characters, includes '$'

Radix-50, which only allows /50/ characters, also includes '$'.

So you couldn't have a-z, but you could have dollar symbols. That
suggests '$' was considered important (by DEC at least).

--
bartc

Gordon Burditt

2017-05-08 03:25:40 UTC

Post by bartc

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

What platform are you using that needs this?

I've seen plenty of assemblers that use $ as a name for the current
location counter. Most of these parse $ as a single-character
token, so anything like aaa$bbb will be a syntax error, for the
same reason that three consecutive identifiers like aaa bbb ccc
would be a syntax error. It is possible to allow symbols containing
$ in these assemblers without parsing ambiguity. This may require
substantial code changes.

There are also plenty of assemblers that use $ as a prefix for a
machine instruction operand, sometimes to indicate that it is an
immediate operand (rather than a memory reference). Others use $
as a prefix for a constant to indicate that it is in hexadecimal.
This can be compatible with the use of $ in identifiers if an
indentifier is prohibited from *BEGINNING* with a $. Is that
acceptable for the OP's intended usage?

Post by bartc
Let me reiterate, in either a compiler /or/ assembler, support for $ can
....
....

That's unlikely. For one thing, a C compiler needs two different
character classes, one for the characters that can *BEGIN* an
identifier, and one for the characters after the first in an
identifier (digits are part of the second class but not the first).
It is probably more efficient for a compiler to use the C locale
and <ctype.h> functions. Or, if they're allowing all the characters
in Unicode (Unicode has properties "ID_Start" and "ID_Continue" for
this, and they seem to apply to all of a random sampling of Egyptian
Hieroglyphics), that would probably involve a switch with over 10,000
cases.

Post by bartc
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode), then it will
need to devise some escape mechanism. Not take it out on the user by
saying you can't do this because there might be the odd assembler out
there that can't handle it.

Definition: system linker - the linker for a platform that is the first
one in this list, if there is one at all:
(a) The linker that ships with the system.
(b) The linker from an add-on development package that is distributed by
the same vendor as the system.
(c) The most-used linker for the platform if neither (a) nor (b) are available.
(d) There may not even BE a system linker.

If there is a system linker, it's generally very desirable that you be
compatible with it. If there is no system linker, then chances are your
implementation will become one, and you don't have to worry about backward
compatability with anything but older versions of your code.

IMHO, a compiler for a platform should not accept $ in identifiers
by default if the system linker won't accept them and it doesn't
ship with a linker that accepts those identifiers. In other words,
don't confuse the user with vague linker errors when the compiler
should have caught the error earlier and there *ISN'T* a linker (or
escape mechanism) that will work.

Devising an escape mechanism that is an extension, not an incompatible
change, may not be easy (or even possible, given compatability
requirements). If there's an existing escape mechanism, great,
extend it. URL-encoding may come to mind if the system linker
accepts symbols with a % in it, but it may be impossible to stay
compatible with the system linker and your linker with arbitrary
use of % permitted and pre-existing. If the system linker generates
'foo%' and yours generates 'foo%25', they won't match up. The
system linker's 'foo%25' may incorrectly match your 'foo%' (which
escaped is 'foo%25') and mess up with a bogus duplicate symbol
error.

Assemblers are also likely to have all sorts of other reserved
identifiers, such as opcode names, register names, pseudo-op names,
etc. In the past this was dealt with by prefixing all C identifiers
with "_".

Incidentally, in C, the backquote is already taken for BCD constants,
although that usage was pretty obsolete even in 1978.
@ may be reserved in assembly language as a prefix for register names.

bartc

2017-05-08 09:35:00 UTC

Post by Gordon Burditt

Post by bartc

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

What platform are you using that needs this?

The platform doesn't need it; /I/ do because I like using $ to ensure a
special identifier won't clash with a normal one. "_" is not as good for
that purpose because user identifiers are full of "_", and it doesn't
stand out as well.

Post by Gordon Burditt

Post by bartc
Let me reiterate, in either a compiler /or/ assembler, support for $ can
....
....

That's unlikely. For one thing, a C compiler needs two different
character classes, one for the characters that can *BEGIN* an
identifier, and one for the characters after the first in an
identifier (digits are part of the second class but not the first).

OK, two lines if the cases are each put on their own line. But here are
the two relevant lines from my own C compiler (not C code):

when 'A'..'Z','a'..'z','$','_' then
when 'A'..'Z','a'..'z','$','_','0'..'9' then

So providing support for "$" doesn't require any more lines, it just
needs a minor tweak on two lines. (And no, it won't work on EBCDIC, but
I don't care!)

(Actually it could be reduced to a one-line tweak if necessary.)

Post by Gordon Burditt
It is probably more efficient for a compiler to use the C locale
and <ctype.h> functions. Or, if they're allowing all the characters
in Unicode (Unicode has properties "ID_Start" and "ID_Continue" for
this, and they seem to apply to all of a random sampling of Egyptian
Hieroglyphics), that would probably involve a switch with over 10,000
cases.

We're not talking about allowing any Unicode character in an identifier.
This is a small systems language and the $ is useful as a more solid
partition between two classes of identifiers than _.

(Besides you wouldn't use a 10,000-way switch; if the character wasn't
ASCII, you'd look at the identifier attribute directly. But they might
need special treatment if there would be a problem when used with
external assemblers or linkers.)

Post by Gordon Burditt

Post by bartc
In the rare case of a compiler working with an existing assembler that
doesn't support '$' (or any other character, eg. Unicode), then it will
need to devise some escape mechanism. Not take it out on the user by
saying you can't do this because there might be the odd assembler out
there that can't handle it.

Definition: system linker - the linker for a platform that is the first
(a) The linker that ships with the system.
(b) The linker from an add-on development package that is distributed by
the same vendor as the system.
(c) The most-used linker for the platform if neither (a) nor (b) are available.
(d) There may not even BE a system linker.
If there is a system linker, it's generally very desirable that you be
compatible with it. If there is no system linker, then chances are your
implementation will become one, and you don't have to worry about backward
compatability with anything but older versions of your code.
IMHO, a compiler for a platform should not accept $ in identifiers
by default if the system linker won't accept them and it doesn't
ship with a linker that accepts those identifiers. In other words,
don't confuse the user with vague linker errors when the compiler
should have caught the error earlier and there *ISN'T* a linker (or
escape mechanism) that will work.
Devising an escape mechanism that is an extension, not an incompatible
change, may not be easy (or even possible, given compatability
requirements). If there's an existing escape mechanism, great,
extend it. URL-encoding may come to mind if the system linker
accepts symbols with a % in it, but it may be impossible to stay
compatible with the system linker and your linker with arbitrary
use of % permitted and pre-existing. If the system linker generates
'foo%' and yours generates 'foo%25', they won't match up. The
system linker's 'foo%25' may incorrectly match your 'foo%' (which
escaped is 'foo%25') and mess up with a bogus duplicate symbol
error.
Assemblers are also likely to have all sorts of other reserved
identifiers, such as opcode names, register names, pseudo-op names,
etc. In the past this was dealt with by prefixing all C identifiers
with "_".
Incidentally, in C, the backquote is already taken for BCD constants,
although that usage was pretty obsolete even in 1978.
@ may be reserved in assembly language as a prefix for register names.

You're talking as though this problem hasn't already been solved decades
ago, and probably dozens of times since.

--
bartc

bartc

2017-05-08 09:53:32 UTC

Post by bartc
when 'A'..'Z','a'..'z','$','_' then
when 'A'..'Z','a'..'z','$','_','0'..'9' then
So providing support for "$" doesn't require any more lines, it just
needs a minor tweak on two lines. (And no, it won't work on EBCDIC, but
I don't care!)

Actually, if this was translated to C, as would be necessary to run on
such a machine, then it might well do. Because C case values are all
separate. Even more reason not to care.

--
bartc

David Kleinecke

2017-05-08 18:47:57 UTC

Post by bartc

Post by bartc
when 'A'..'Z','a'..'z','$','_' then
when 'A'..'Z','a'..'z','$','_','0'..'9' then
So providing support for "$" doesn't require any more lines, it just
needs a minor tweak on two lines. (And no, it won't work on EBCDIC, but
I don't care!)

Actually, if this was translated to C, as would be necessary to run on
such a machine, then it might well do. Because C case values are all
separate. Even more reason not to care.

Standard way to handle
when 'A'..'Z','a'..'z','$','_' then
is to map the characters through a character array to two
characters and do an "if" on the result
if (map[x] == 'a') ....

Obviously generalizable to more cases.

bartc

2017-05-08 18:59:25 UTC

Post by David Kleinecke

Post by bartc

Post by bartc
when 'A'..'Z','a'..'z','$','_' then
when 'A'..'Z','a'..'z','$','_','0'..'9' then
So providing support for "$" doesn't require any more lines, it just
needs a minor tweak on two lines. (And no, it won't work on EBCDIC, but
I don't care!)

Actually, if this was translated to C, as would be necessary to run on
such a machine, then it might well do. Because C case values are all
separate. Even more reason not to care.

Standard way to handle
when 'A'..'Z','a'..'z','$','_' then
is to map the characters through a character array to two
characters and do an "if" on the result
if (map[x] == 'a') ....
Obviously generalizable to more cases.

Not sure what you mean here.

But that second sequence is actually used to initialise a character map
(char[256]), where each element is 1 if the corresponding character
belongs in the set, 0 otherwise).

Testing such a map was a fast way, after a particular kind of token was
identified, to probe for the end of the token:

while (alphamap[*lxsptr++]);

If you mean using a map to classify sets of characters as alpha etc,
then that's a technique I used in the 80s; I like a change...

--
bartc

Kenny McCormack

2017-04-25 10:43:20 UTC

Post by bartc
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Are you generating machine code or assembly?
If the latter, did you write the assembler, or do you rely
on the system's assembler?
Maybe you shouldn't assume you're smarter than everyone else.
(Or maybe I should drop you back in my KF, most of your posts
just raise my blood pressure.)

Or maybe you should see a doctor. (They have drugs for that now)

--
The plural of "anecdote" is _not_ "data".

Robert Wessel

2017-04-25 14:33:31 UTC

Post by bartc

Post by maximus fl
This is odd. gcc and clang before compile this program with no problems, and the prpgram runs just fine. I can't find a reason why.
#include <stdio.h>
int main(void) {
int A$ = 10;
int $B = 25;
int z = $B;
int c = 0;
$B = A$ + $B;
printf("A$ is: %i and $B is: %i and the sum is %i\n", A$, z, $B);
c = A$ +$B;
printf("c :=> %i\n", c);
return 0;
}
This works, but the c standard says this variable name should not be valid. Other symbols I have tried cause a compiler error, but this does not. Does anyone know why? Is there something special about the $. the $ is a special character in C, but I can't located anything about it. Does any one have any thoughts on this? Thanks in advance.

Most compilers I've tried accept $ in names. Some don't (eg. Tiny C), so
it is necessary to avoid them in order to have code that compiles on
anything.
It's silly really because it can be literally a one-line change in a
compiler to allow $.

Back then, you'd often not have a $ on many nominally "ASCII" systems,
since it was often replaced with a local symbol. For example, many
British systems would have displayed that codepoint as a pounds
sterling sign.

One of the motivations for trigraphs was that some ("ASCII") systems
replaced brace and bracket characters with things like accented
characters. At least the latter was not that common when doing
programming on such systems (and that particular application of
trigraphs was mostly used on EBCDIC systems).

Scott Lurndal

2017-04-25 14:48:13 UTC

This is odd. gcc and clang before compile this program with no problems, an=
d the prpgram runs just fine. I can't find a reason why.
#include <stdio.h>
int main(void) {
int A$ =3D 10;
int $B =3D 25;
int z =3D $B;=20
int c =3D 0;
$B =3D A$ + $B;
printf("A$ is: %i and $B is: %i and the sum is %i\n", A$, z, $B);
c =3D A$ +$B;
printf("c :=3D> %i\n", c);
return 0;
}
This works, but the c standard says this variable name should not be valid

Most C compilers accept $ in variable names. All the VMS
system calls have $ in the names, for example (e.g. SYS$QIOW)
which may be part of the reason that so many compilers support it.

48 Replies
2 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

maximus fl 2017-04-24 23:17:26 UTC

Stefan Ram 2017-04-24 23:23:24 UTC

Stefan Ram 2017-04-24 23:51:06 UTC

Keith Thompson 2017-04-25 00:26:38 UTC

maximus fl 2017-04-25 01:18:56 UTC

David Kleinecke 2017-04-24 23:28:34 UTC

bartc 2017-04-25 00:09:35 UTC

Noob 2017-04-25 07:37:52 UTC

bartc 2017-04-25 09:40:06 UTC

David Brown 2017-04-25 11:07:15 UTC

bartc 2017-04-25 12:24:15 UTC

bartc 2017-04-25 12:27:07 UTC

Ben Bacarisse 2017-04-25 12:59:32 UTC

bartc 2017-04-25 13:20:35 UTC

Ben Bacarisse 2017-04-25 13:27:36 UTC

Keith Thompson 2017-04-25 18:52:58 UTC

Malcolm McLean 2017-04-25 20:40:39 UTC

bartc 2017-04-25 21:14:58 UTC

David Brown 2017-04-25 22:16:02 UTC

bartc 2017-04-25 23:13:38 UTC

David Brown 2017-04-26 08:10:56 UTC

s***@casperkitty.com 2017-04-25 21:33:23 UTC

Malcolm McLean 2017-04-25 13:39:06 UTC

Robert Wessel 2017-04-25 14:35:22 UTC

Gareth Owen 2017-04-25 17:39:53 UTC

bartc 2017-04-25 17:49:53 UTC

David Kleinecke 2017-04-25 18:19:40 UTC

bartc 2017-04-25 18:31:12 UTC

Ben Bacarisse 2017-04-25 22:36:48 UTC

David Kleinecke 2017-04-26 00:04:21 UTC

Robert Wessel 2017-04-26 05:32:53 UTC

David Brown 2017-04-25 14:02:40 UTC

Tim Rentsch 2017-05-02 12:25:04 UTC

s***@casperkitty.com 2017-05-02 17:04:20 UTC

Tim Rentsch 2017-05-05 10:06:31 UTC

s***@casperkitty.com 2017-04-25 14:12:20 UTC

Robert Wessel 2017-04-25 14:48:05 UTC

Robert Wessel 2017-04-25 14:52:51 UTC

Noob 2017-04-26 06:46:27 UTC

bartc 2017-04-26 10:46:21 UTC

bartc 2017-04-26 11:00:50 UTC

Gordon Burditt 2017-05-08 03:25:40 UTC

bartc 2017-05-08 09:35:00 UTC

bartc 2017-05-08 09:53:32 UTC

David Kleinecke 2017-05-08 18:47:57 UTC

bartc 2017-05-08 18:59:25 UTC

Kenny McCormack 2017-04-25 10:43:20 UTC

Robert Wessel 2017-04-25 14:33:31 UTC

Scott Lurndal 2017-04-25 14:48:13 UTC

about - legalese

Loading...