Post by s***@casperkitty.comPost by David BrownPost by s***@casperkitty.comPost by Keith ThompsonThe term, as you know, is "undefined behavior". Hiding it behind extra
wording is not helpful.
Many useful tasks cannot be performed efficiently, or at all, without using
behaviors which are defined by some implementations but are not mandated by
the Standard. What terminology would you use to distinguish those behaviors
from behaviors which are not defined by anything whatsoever?
Implementation-defined behaviour, or target-specific behaviour. The C
standards specifically refer to some behaviours as "implementation
defined", which means the implementation must define (and document) the
behaviour. Implementations can freely add other defined behaviour (as
long as it does not contradict standards-defined behaviour).
The phrase "Implementation-Defined" behavior, as used by the Standard, refers
to situations where ALL implementations are REQUIRED to document something
about the behavior. Platform-defined behavior may be good.
That seems a reasonable choice to me. But I'd wait for others such as
Keith to express an opinion.
Post by s***@casperkitty.comPost by David BrownPost by s***@casperkitty.comThe fact that programs which invoke upon not-defined-by-anything behaviors
are broken does not imply that programs which invoke not-defined-by-the-
Standard-but-instead-defined-by-other-things behaviors are broken. Some
compiler writers seem unable to distinguish those categories, however.
A compiler will follow "defined by the standards" /and/ "defined by the
implementation" behaviours. This second part includes extensions,
implementation-defined behaviour, and any cases where the implementation
specifically provides definitions for things that the standards leave
undefined.
And if all general-purpose implementations for a platform have processed a
certain behavior a certain way, quality general-purpose implementations should
continue to do likewise unless they document a compelling reason to do
otherwise.
No. If all an implementation wants to give you behaviour that you can
rely on, it should document it. Otherwise you are on your own - you
/can/ write code, compile it, and see that it works as you expect, but
you should not expect it to remain working if you change compiler flags,
update to a new version, or make other changes.
What you are suggesting would be a recipe for stagnation - compilers
could not change and improve because they would have to try to emulate
the unwritten and unspecified behaviour of other tools.
It would also make users' life a lottery - how is the user supposed to
know what the compiler writer sees as a "compelling reason" ?
Post by s***@casperkitty.comPost by David BrownA compiler has /no/ obligation to follow behaviour that is defined
elsewhere. That includes behaviour that may seem "natural" for the
target platform, behaviour that other compilers support, behaviour that
existed in older C versions or standards, or behaviour that is in the
imagination of the user.
True, but a C implementation whose behavior deviates from those of existing
general-purpose compilers for similar platforms should not call itself a
quality general-purpose compiler, since general-purpose compilers should be
suitable for processing code written for pre-existing general-purpose compilers
for similar platforms.
The trick here is very simple - write code that relies on
standards-defined behaviour, and perhaps basic implementation-defined
behaviour (such as the size of an "int") if that is helpful to your code
and you don't need wide portability. Some target architectures have
specified ABIs that compilers will stick to, giving you a nice selection
of implementation-specific behaviour you can rely on across different tools.
If you need something beyond that, your code is tied tightly to the
implementation (possibly even the specific version and flags). That's
fine too - C is designed to allow that kind of coding. Your mistake is
in thinking that /other/ compilers somehow have an obligation to support
/your/ non-portable code.
Post by s***@casperkitty.comPost by David BrownYou can imagine that every compiler is written using only the C
standards (whichever versions they support) and their own reference
manual as the requirements. /Nothing/ else matters - no other behaviour
is defined.
Compiler writers used to recognize that an ability to run code designed for
other compilers was a useful feature, and would thus take into account the
behavior of any compilers they might be competing with.
Compiler writers will often support extensions that exist on other
compilers. They might take inspiration from others in how they specify
implementation-defined behaviour.
But I have never heard of a compiler trying to emulate another
compiler's treatment of undefined behaviour. Can you give real-life
examples?
Even if there is, I don't see that as being a useful feature, except
perhaps to propagate bugs and poor coding. And since it might hinder
new features or optimisations, it could be a /bad/ thing. If I write
code that relies on unwritten details of an implementation (and that
sometimes happens), then the code is specific to the compiler and flags
- I would not even bother trying to compile it with another tool without
extensive testing, checking and qualification.
It is a different thing entirely if the compiler /documents/ the
behaviour, perhaps using a compiler switch. For example, it would be a
bad idea for gcc to emulate old compiler's behaviour of wrapping signed
integer overflows, because it hinders optimisations. But it is a fine
idea to provide a documented "-fwrapv" switch which enables such
wrapping behaviour. /That/ is how you deal with compatibility with old
code that relies on specific undefined behaviour.
Post by s***@casperkitty.comNo law requires
that any compiler be compatible with non-mandated features of any other, but
a language where compiler writers try to do so will be more useful than one
where they don't.
Post by David BrownSo where are the definitions for your
"not-defined-by-the-Standard-but-instead-defined-by-other-things"
behaviours? And why do you think that compiler writers need to obey
definitions from unrelated places?
Among other things, in the corpus of programs that will work just fine on a
wide range of older compilers, but get tripped up by modern ones. If one of
the purposes of a compiler is to be suitable for use with a corpus of existing
code, then the corpus of code will, essentially by definition, establish the
what would be needed to make a compiler suitable for the purpose of using it.
The main target for a compiler is correctly written code. If it does
not work well with broken code that happened to work on other compilers,
that's fine. /I/ don't want a compiler that is hobbled so that it works
with /your/ old broken code. I want a compiler that does the best job
for /correct/ code. I also want it to be helpful in telling me about
broken code - if I make a mistake, I would prefer to be informed about
it, rather than for the compiler to try to guess what I meant based on
what somebody might have meant in code long ago.
People with old code that is badly written should stick to old compilers
that they have tested with that code. Since the behaviour of their code
is by definition undefined, there is no way for a new compiler to be
sure it supports these unwritten rules. Only in some specific cases,
such as wrapping behaviour for integer overflow, is it even possible for
a new compiler to support the old broken code.
Post by s***@casperkitty.comI would further suggest that on most platforms a compiler that was tasked
with generating code in "mindless-translator" fashion would in many cases
not be able to avoid exposing useful behaviors which are documented by the
environment without having to generate extra code for that purpose.
If they are documented by the tools, that's fine. If by "documented by
the environment", you mean "behaviour of certain instructions on the
cpu", then that is not C - C is not an assembler. If your code relies
on such behaviour in old tools, then stick to the old tools - your code
is only suitable for use on the specific implementations you have tested
it with.
I have worked with code written for "mindless translator" compilers.
And when I have moved such code over to better compilers, I have gone
through all the code carefully, "porting" it over to standard C (or
implementation-dependent C, as necessary). I don't expect my new tools
to work like mindless translators just because the old ones did. The
alternative, which I also do, is simply to continue to use the mindless
translator tools for that code. When I have to dig out and modify 20
year old code, I use the same 20 year old compiler as I did originally.
Post by s***@casperkitty.comIn such
cases, a compiler which claims to be suitable for low-level programming on
that platform should expose such behaviors likewise. Doing so may mean that
the compiler can only achieve 50-90% of the optimizations that would otherwise
be possible, but a compiler that can process a large corpus of code and
achieve 50-90% of the possible optimizations may be much more useful for many
purposes than one which can achieve more optimizations on a few programs but
can't be trusted to yield more than 0% on the rest.
I think perhaps I put a lot more emphasis on writing good code than you
do. I just don't see it as a problem. For the types of targets where
you typically need to write "special" code that relies on weird
behaviour in order to get something of acceptable efficiency, you rarely
want to run that code on anything else anyway.
Post by s***@casperkitty.comPost by David BrownPost by s***@casperkitty.comPost by Keith ThompsonThe C standard defines clearly and unambiguously what it means by
"shall". The meaning depends on the context; it means one thing in a
constraint, and something else outside a constraint.
Most standards specify what conforming entities have to "do", and are quite
specific about what entities are responsible for ensuring what. The Standard
sometimes talks about obligations of programs or implementations, but
sometimes uses "shall be" to impose obligations upon grammatical constructs.
Read chapter 4 of the standards. It tells you what "shall" and "shall
not" mean.
Post by s***@casperkitty.comIf a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime-
constraint is violated, the behavior is undefined. Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior. There is no difference in emphasis among
these three; they all describe ‘‘behavior that is undefined’’.
Note the final sentence here. This should, I hope, put an end to your
endless claims about what you think the standards authors actually meant.
Undefined by the standard, which is not the same as behavior which would
not be necessary to make a compiler suitable for purposes like low-level
programming.
As I have said many times, low-level programming is what I do for a
living - on a wide range of targets, with a wide range of tools. In
almost all cases, you can do it fine with standard defined behaviour,
possibly with some implementation defined behaviour. Occasionally you
need "platform defined behaviour" (the term from earlier in this post).
Very occasionally, you need something that is not really defined at
all, and known to work by inspection of the generated code or by
testing. What you /don't/ need, ever, is guesses about what behaviour
you think should have been defined, or would have been defined, or that
somebody meant to define.
Post by s***@casperkitty.comPost by David BrownPost by s***@casperkitty.comIndeed, but some programmers seem to think that permission to behave
nonsensically should be viewed, in and of itself, as a compelling reason
to behave nonsensically.
Name /one/ such programmer. Give even /one/ example where you can
clearly demonstrate that the reason a compiler behaves in a
"nonsensical" manner is purely because it is allowed to behave
"nonsensically". Of course, you can't demonstrate this from a single
program - you have to show that there are /no/ correct programs that
don't benefit in some way from the compiler feature.
The whole concept behind UB-based dead-branch elimination is that all forms
of UB are equivalent. As I've demonstrated, gcc will use the fact that an
overflow may occur while evaluating two "unsigned short" values as a basis
for making inferences about those values, even if the result is truncated
mod 65536. Can you suggest any *other* basis for gcc's behavior?
gcc's behaviour is to follow the rules of C about integer promotion.
There was a time in C's history when some tools used "value preserving
promotion" while others used "signedness preserving promotion". After
due consideration and debate, it was decided to use "value preserving
promotion". Whether you like that or not (personally, I would be
happier with /no/ promotion to int), that's the rules of C - and
consistency is vital here. gcc follows these rules, and the implication
of them. There are a few situations where the results can then seem
strange.
But you are basically accusing the gcc authors of specifically looking
at code like your beloved uint16_t multiplication, and planning exciting
ways to confuse programmers by generating "nonsense" code just to prove
that /they/ have read the C standards.
In reality, it is nothing but a side-effect of optimisation passes that
are used to generate slightly more efficient object code from correct
source code.
Incidentally, how have the gcc authors responded to your bug report on
this? What about when you asked for improved warnings when such dead
code was eliminated? I presume that since you have told us in c.l.c.
about this a few hundred times over the past few years, you have asked
the gcc developers about it.
Oh, and of course gcc already provides options that let you get the
behaviour you presumably want here, even though there is no written
specification for it. gcc has a wide range of flags to control
optimisation - you can figure out the details yourself, or simply
disable optimisation (which is, incidentally, the default - you have to
/ask/ gcc to do the dead branch optimisation).
Post by s***@casperkitty.comPost by David BrownRemember, of course, that when there is no definition of the correct
behaviour, it does not make sense to say that something is
"nonsensically", no matter what it does.
There are a number of actions for which some compilers offer behavioral
guarantees and some don't, but for which a single behavior would satisfy the
behavioral guarantees of all general-purpose compilers for similar platforms.
Has any general-purpose (non-sanitizing) compiler for a two's-complement
silent-wraparound hardware *ever* defined a behavior for
(ushort1*ushort2) & 65535u
which was not consistent with performing an arithetical computation and
mod-65536-reducing the result?
(I assume you meant to add a requirement of "int" being longer than
"short" to your list.)
That is not the question you should be asking. The question is whether
any compiler has ever defined a behaviour for such code? The answer, to
my knowledge, is no. It doesn't matter than none have defined behaviour
that is inconsistent with your expectations - none have defined it in a
way that /is/ consistent with your expectations. Some might have
happened to generate code that you like - equally, some generate code
that you /don't/ like. /None/ specify what they should generate.
(Feel free to provide references to compiler manuals if you have
examples that prove me wrong here.)
Post by s***@casperkitty.comIf processing such code in such fashion
would make a compiler compatible with a wider range of code than would
any other treatment, and would not impede performance, I'd say that such
behavior would be desirable in a compiler that is intended to be suitable
with the maximum corpus of existing code.
I know you'd say that - you have done so many, many times. Your logic
is still invalid, as are your premises.