Post by bartcPost by David BrownDifferent languages have different ways of defining (or leaving
undefined) the behaviour of impossible or meaningless actions. If we
assume 16-bit int (because the numbers are easier), and have x = 32767,
then what should be the result of adding 1 to x? Standard mathematics
says it should be 32768, and anything else is wrong. Typical processors
say it should be -32768, because that is the easiest in the hardware
design. Some algorithms would much prefer 32767, because saturation is
"less wrong" than anything else. Some algorithms would prefer a NaN
type indicator, or jumping to some error routine.
C says "Adding 1 to x here is meaningless. There is no behaviour which
can be called 'correct'. So we don't give it a meaning. The compiler
can assume it never happens, if that lets it generate more efficient
code for correct cases".
You can apply the same argument to unsigned. With the same 16 bits,
what's the result of adding 1 to 65535? Mathematics says it should be
65536. Typical processors will give 0, as that is simplest. Some
algorithms prefer NaN, etc.
It's exactly the same issue!
You certainly could. The C language designers felt it was useful to
define unsigned arithmetic as modulo arithmetic.
It would be entirely reasonable for a language designer to say that
signed integer overflow used two's complement wraparound. But the point
is, that the /C/ designers did not do so - they specifically and
explicitly made it undefined behaviour. It would be entirely reasonable
for a language designer to make unsigned overflow undefined behaviour -
again, the /C/ designers did not do so. They could also have made it
implementation defined, but chose not to. (Though an implementation is
always free to define the behaviour if it wants to.)
When designing your own language, you can make the choices /you/ want
here. Some languages use the same system as C. Others make different
decisions. Given that /any/ behaviour on overflow (signed or unsigned)
is going to be wrong in many circumstances, but some behaviours might be
useful in some cases, a language designer has plenty of options here.
He picks one, documents it, and makes that the behaviour for his
language. For different languages, the "best" choice may well be different.
So the C language designers decided on undefined behaviour for signed
overflow, and wraparound behaviour for unsigned overflow. That gives
programmers a useful set of features, it gives compilers opportunities
for some good optimisations, and it can be easily implemented on a wide
range of hardware (which in the early days of C included some machines
that would have a lot of inconvenience with any of the possible choices).
I am not trying to say that making signed integer overflow undefined
behaviour is the "right" choice (though I agree with the C designers in
this case). I am just trying to explain to you that that is how C
works. If you want to program in C, or make C tools, then you need to
understand the language as it really is - you can't just pretend it is
something else or works in a different way, just because you might have
preferred some differences. You can make these changes when you are
inventing the Bart-C variation of C - but not when you are writing /real/ C.
Post by bartcThe only difference might be that unsigned representation is more
widespread than two's complement. Is that the reason why C says it's a
no-no?
That may well have been a major consideration of the original C language
designers.
Post by bartc0.1% of machines don't use two's complement, therefore the other 99.9%
aren't allowed to assume it. Furthermore, if they do, then some C
compilers will endeavour to break your code.
You misunderstand. The compiler will not "break" your code if you
assume that signed integers use two's complement wrap-around. Your code
is /already/ broken. It is incorrect C code - it is a sequence of
letters and symbols without a valid definition in C. The compiler does
not "break" it by assuming your signed integers don't overflow - your
signed integers are not allowed to overflow in C. The compiler does not
produce "correct" or "working" code when the relevant optimisations are
disabled - your C code has /no/ defined behaviour, and therefore /no/
generated code can be said to be "correct". It is a case of "garbage
in, garbage out" - depending on compiler options, you may get different
varieties of garbage out.
As the father of computing, Charles Babbage, puts it:
On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into
the machine wrong figures, will the right answers come out?' I am not
able rightly to apprehend the kind of confusion of ideas that could
provoke such a question.
Post by bartcPost by David BrownPost by bartcYet C has a problem with this.
No, C has no problem with this - /you/, and a few other equally mistaken
programmers, have problems with it. You appear to believe that C is not
a programming language, but merely a kind of high level assembly - and
you are wrong. C is not, and never has been, a high level assembler or
semi-portable assembler.
That's a problem with C then.
No, it is a problem with /you/. You don't understand what C is - that
is /your/ problem.
You are not alone in having such problems with C - not everyone is able
to grasp how C works. But most people either learn C properly in the
end, or give up and either use a different programming language, or stop
programming. I cannot comprehend why you have such a stubborn
resistance to making this choice.
Post by bartcSomeone writes a program in X where signed overflow has a behaviour
consistent with two's complement. And they want to run it on machine Y
which uses two's complement. So far, so good.
That is fine, as long as "X" is not the C programming language (or any
other language that does not guarantee such overflow behaviour).
Post by bartcBut if they need to pass it through C, in order for X (which might just
be pseudo-code in the programmer's mind) to be translated into a form
that runs on Y, then the problems start.
It is not a problem in the slightest. If you are writing a translator
from language X in which signed overflow is defined, into C in which it
is not defined, then you have three choices:
1. Make it a requirement that the resulting C is treated as "C with
defined signed integer overflow" - insist on compilation with "-fwrapv"
flag in gcc and clang, and disallow compilers that do not have
/documented guarantees/ of this type of behaviour.
2. Make it a requirement that the resulting C is compiled on machines
with two's complement signed integers, and use unions or casts to
convert between signed integers and unsigned integers as necessary to
get the required behaviour. This will result in ugly C intermediary
code, but that should be of no relevance. It is likely that the code
will compile optimally on most compilers.
3. Generate the code in a manner that avoids signed integer overflow.
This might be a little fiddly, and certainly a bit messy in the
generated code - but it is intermediary code, and it is /allowed/ to be
messy.
What you don't get to do, is moan that C doesn't work the way you think
it ought to work, and claim it is C's problem. /You/ are defining
language X. /You/ are making the X-to-C compiler. It is /your/ problem.