Discussion:
Is this code considered to transgress the signed overflow purism?
(too old to reply)
j***@gmail.com
2017-04-27 10:13:45 UTC
Permalink
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */

void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
byteMask = chMask;
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
}
----------------

Yes, there is a reason I am ignoring limits.h. Maybe not a good reason, but I will debate that later.

Right now I just would like to hear whether shifting unsigned values until the bits are supposed to disappear, and testing them for zero, is considered in the range of machine-defined stuff that the current crop of optimization sophomores (pardon my Latin) conflates with undefined.

Or did I walk off the world when I shifted one too far and filled the unsigned char with ones and it overflowed? (I'll look at that some more now.)

--
Joel Rees
j***@gmail.com
2017-04-27 10:41:25 UTC
Permalink
Never mind. I realized I was just never calling setULbytes(). (And that is completely defined behavior in this case. Heh.)
Post by j***@gmail.com
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */
void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
byteMask = chMask;
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
}
----------------
Yes, there is a reason I am ignoring limits.h. Maybe not a good reason, but I will debate that later.
Right now I just would like to hear whether shifting unsigned values until the bits are supposed to disappear, and testing them for zero, is considered in the range of machine-defined stuff that the current crop of optimization sophomores (pardon my Latin) conflates with undefined.
Or did I walk off the world when I shifted one too far and filled the unsigned char with ones and it overflowed? (I'll look at that some more now.)
--
Joel Rees
Ben Bacarisse
2017-04-27 10:59:30 UTC
Permalink
Post by j***@gmail.com
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */
void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
Apart from the side effect on i this could be written as

chMask = -1;
Post by j***@gmail.com
byteMask = chMask;
Or this could simply be

byteMask = (unsigned char)-1;
Post by j***@gmail.com
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
What's wrong with

ulBytes = sizeof (unsigned long);

?
Post by j***@gmail.com
}
----------------
Yes, there is a reason I am ignoring limits.h. Maybe not a good
reason, but I will debate that later.
I don't think you need limits.h for this.
Post by j***@gmail.com
Right now I just would like to hear whether shifting unsigned values
until the bits are supposed to disappear, and testing them for zero,
is considered in the range of machine-defined stuff that the current
crop of optimization sophomores (pardon my Latin) conflates with
undefined.
Shifting unsigned values is well-defined provided the shift is (a)
positive and (b) less the the type's width. Your shift, 1, is always
fine. But, you are *not* shifting unsigned values in the first loop!
unsigned char promotes to (signed) int so you are shifting a positive
signed value by 1. That is also fine provided the result can be
represented as a signed int. Unless you have a very odd architecture,
the loop will stop before that can happen. So again, fine.
Post by j***@gmail.com
Or did I walk off the world when I shifted one too far and filled the
unsigned char with ones and it overflowed? (I'll look at that some
more now.)
Just assign -1. When -1 is assigned to the unsigned char (or you use a
cast since the unsigned char variable is not really needed anymore) the
result is reduced modulo one more than the maximum value that can be
represented.
--
Ben.
j***@gmail.com
2017-04-27 21:35:20 UTC
Permalink
Post by Ben Bacarisse
Post by j***@gmail.com
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */
void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
Apart from the side effect on i this could be written as
chMask = -1;
Post by j***@gmail.com
byteMask = chMask;
Or this could simply be
byteMask = (unsigned char)-1;
You know, it's funny.

We have this coddling of brain-damaged CPUs when a left-shift on a negative operand is specified on sign-magnitude architectures or something, but we have this assurance that casting a -1 to an unsigned fills the thing with ones.

(I know calling it brain-damaged is too extreme, but ...)

And I have a hard time remembering that assurance is in the standard. Maybe it's because I don't really use sign-magnitude that much.

The standards committee gets to tilt at windmills, and I get to chase their rainbows.

I really wish the standards committee had instead specified a means of compile-time testing the architecture for the specific corner- and edge-cases and providing a means to either pop up a warning to remind the reader of compiler output that the code wasn't intended to be compiled on that particular architecture, or use conditional compiled sections to deal with the corner-case on those CPUs.

A few more hours of their work to help a thousand guys like me to cut a few wasted days off of tilting with our own windmills every now and again.
Post by Ben Bacarisse
Post by j***@gmail.com
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
What's wrong with
ulBytes = sizeof (unsigned long);
?
Well, since you ask, you're right. Either way requires run-time behavior.
Post by Ben Bacarisse
Post by j***@gmail.com
}
----------------
Yes, there is a reason I am ignoring limits.h. Maybe not a good
reason, but I will debate that later.
I don't think you need limits.h for this.
I wanted to be able to use limits.h for it.
Post by Ben Bacarisse
Post by j***@gmail.com
Right now I just would like to hear whether shifting unsigned values
until the bits are supposed to disappear, and testing them for zero,
is considered in the range of machine-defined stuff that the current
crop of optimization sophomores (pardon my Latin) conflates with
undefined.
Shifting unsigned values is well-defined provided the shift is (a)
positive and (b) less the the type's width. Your shift, 1, is always
fine. But, you are *not* shifting unsigned values in the first loop!
unsigned char promotes to (signed) int so you are shifting a positive
signed value by 1.
Unsigned char promoted to signed int?

That explains what I was fussing with so long ago when I made two different implementations of ctype for shift-JIS characters.
Post by Ben Bacarisse
That is also fine provided the result can be
represented as a signed int. Unless you have a very odd architecture,
the loop will stop before that can happen. So again, fine.
Yeah, char should have been called byte so that they could call real character types char.
Post by Ben Bacarisse
Post by j***@gmail.com
Or did I walk off the world when I shifted one too far and filled the
unsigned char with ones and it overflowed? (I'll look at that some
more now.)
Just assign -1. When -1 is assigned to the unsigned char (or you use a
cast since the unsigned char variable is not really needed anymore) the
result is reduced modulo one more than the maximum value that can be
represented.
--
Ben.
Thanks for walking me through this.

--
Joel Rees

Chasing windmills and tilting at rainbows
http://defining-computers.blogspot.com/2017/04/lsb-1st-vs-msb-1st-ignore-gulliver.html
Keith Thompson
2017-04-27 22:13:46 UTC
Permalink
***@gmail.com writes:
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.

The behavor of left-shifting a negative value is undefined.
This is specified in N1570 6.5.7p4.

(I don't recall that I've ever felt the need to left-shift a negative
integer.)

The behavior of converting an integer (either signed or unsigned) value
to an unsigned type is well defined; the result is adjusted modulo
MAX+1, where MAX is the maximum value of the unsigned type. This is
described in N1570 6.3.1.3p2.

(N1570 is the latest publicly available draft of the ISO C standard,
available at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf .)

[...]
Post by j***@gmail.com
Unsigned char promoted to signed int?
Yes. The promotion rules are in N1570 6.3.1.1p2. The idea is that
there are no arithmetic operations on types narrower than int or
unsigned int, so anything narrower must be "promoted" before you can
operate on it.

If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is converted to
an int; otherwise, it is converted to an unsigned int. These are
called the *integer promotions*.

The 1989 ANSI C committee had to choose between two schemes, both
implemented by existing compilers at the time: "unsigned preserving" and
"value preserving". They chose the latter. (I think I would have
preferred unsigned preserving semantics myself, but I haven't taken the
time to think about possible disadvantages.) This is discussed in the
C99 Rationale, section 6.3.1.1, available at
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf .

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2017-04-27 22:58:05 UTC
Permalink
Post by Keith Thompson
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
The behavor of left-shifting a negative value is undefined.
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
I don't remember ever wanting to shift a signed integer, left or right,
except perhaps an integer constant (such as "1 << 12") without bothering
to add the "u" suffix. Bit manipulation, such as shifts and logical bit
operations, are usually most natural on unsigned types in my experience.
Keith Thompson
2017-04-27 23:44:30 UTC
Permalink
Post by David Brown
Post by Keith Thompson
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
The behavor of left-shifting a negative value is undefined.
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
I don't remember ever wanting to shift a signed integer, left or right,
except perhaps an integer constant (such as "1 << 12") without bothering
to add the "u" suffix. Bit manipulation, such as shifts and logical bit
operations, are usually most natural on unsigned types in my experience.
It can be tempting to use "<<" as a replacement for multiplying by
a power of 2. For example, one might write `n << 3` rather than
`n * 8`. But it's UB if n is negative -- and it's usually better
to write `n * 8` and let the compiler figure out whether it makes
sense to use a shift instruction. (Using shifts for multiplications
by powers of 2 is an old trick, appropriate for old compilers that
didn't do that kind of optimization.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
j***@gmail.com
2017-04-28 03:04:18 UTC
Permalink
Post by Keith Thompson
Post by David Brown
Post by Keith Thompson
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
The behavor of left-shifting a negative value is undefined.
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
I don't remember ever wanting to shift a signed integer, left or right,
except perhaps an integer constant (such as "1 << 12") without bothering
to add the "u" suffix. Bit manipulation, such as shifts and logical bit
operations, are usually most natural on unsigned types in my experience.
It can be tempting to use "<<" as a replacement for multiplying by
a power of 2. For example, one might write `n << 3` rather than
`n * 8`. But it's UB if n is negative -- and it's usually better
to write `n * 8` and let the compiler figure out whether it makes
sense to use a shift instruction. (Using shifts for multiplications
by powers of 2 is an old trick, appropriate for old compilers that
didn't do that kind of optimization.)
"Old" is such a relative term.

Particularly when you're bootstrapping a compiler for a new architecture.

Ideally, no. But practically, yes. Sometimes you develop new things without the budget for the latest/greatest, hoping to cover the lack of budget by shortening your expected lifetime with planned unplanned overtime.

But, of course, the standard does not have to be adhered to when building bootstrapping compilers, you just have to get comfortable with the idea that you'll be writing a lot of throwaway code.

Which is also not really a big deal, especially considering all the bad novels that get written/published. (And bad blog posts and wasted flame wars on newsgroups, ... :-/)

--
Joel Rees

Delusions of being a novelist:
http://joel-rees-economics.blogspot.com/2017/01/soc500-00-00-toc.html
David Brown
2017-04-28 06:53:43 UTC
Permalink
Post by Keith Thompson
Post by David Brown
Post by Keith Thompson
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
The behavor of left-shifting a negative value is undefined.
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
I don't remember ever wanting to shift a signed integer, left or right,
except perhaps an integer constant (such as "1 << 12") without bothering
to add the "u" suffix. Bit manipulation, such as shifts and logical bit
operations, are usually most natural on unsigned types in my experience.
It can be tempting to use "<<" as a replacement for multiplying by
a power of 2. For example, one might write `n << 3` rather than
`n * 8`. But it's UB if n is negative -- and it's usually better
to write `n * 8` and let the compiler figure out whether it makes
sense to use a shift instruction. (Using shifts for multiplications
by powers of 2 is an old trick, appropriate for old compilers that
didn't do that kind of optimization.)
I used to work with compilers that did little in the way of
optimisation, and would do this kind of "hand optimisation". Such tools
were usually for small microcontrollers which may not have things like a
multiply instruction - "(x << 2) + x" could be significantly faster than
"x * 5". Almost certainly, "x" would have been an unsigned type in such
cases - usually uint8_t (even if I had to define that type myself
because the compiler had no <stdint.h>).

These days, if I want to multiply n by 8 I write "n * 8" - and let the
compiler figure out the code.

And if I want to multiply n by 2^17, I write "n * (1u << 17)" or "n *
0x20000" rather than "n << 17". I only write "n << 17" if I want to
shift n left 17 bits. Some people might find the distinction odd -
after all, the result (both mathematically and in generated code) is the
same. But my emphasis is on the logic and meaning of the code - in one
case, I am thinking about scaling a number, and in the other case I am
thinking of moving bitfields within a bit string.
s***@casperkitty.com
2017-04-28 15:23:20 UTC
Permalink
Post by David Brown
I used to work with compilers that did little in the way of
optimisation, and would do this kind of "hand optimisation". Such tools
were usually for small microcontrollers which may not have things like a
multiply instruction - "(x << 2) + x" could be significantly faster than
"x * 5". Almost certainly, "x" would have been an unsigned type in such
cases - usually uint8_t (even if I had to define that type myself
because the compiler had no <stdint.h>).
For multiplication by a constant, I generally prefer to use multiplication
operator since compilers can pretty consistently figure that out. If the
scale factor will always be a power of two, but *which* power of two may
vary, and if there's no need to even think about running the code on any
machine that doesn't use two's-complement math, I would regard x<<y as
being clearer than x*(1<<y). Note that the former has a define case where
the latter doesn't, and on implementations which don't go out of their way
not to expose useful platform behavior it doesn't require knowing the type
of y.

Incidentally, if a platform with 32-bit "int" defines 1<<31 as -1 (likely
on platforms that also handle C++, as C++ would require such behavior),
then if "x" is "long long", then a compiler would likely have to generate
code for x*(1<<y) that was equivalent to (y!=31) ? x<<y : -(x<<y), which
would almost certainly be less efficient than x<<y.
Manfred
2017-04-28 16:05:12 UTC
Permalink
Post by s***@casperkitty.com
Incidentally, if a platform with 32-bit "int" defines 1<<31 as -1 (likely
on platforms that also handle C++, as C++ would require such behavior),
hmm.
I think signed 32-bit int 1<<31 would be -2147483648 on most platforms
Keith Thompson
2017-04-28 16:22:24 UTC
Permalink
***@casperkitty.com writes:
[...]
Post by s***@casperkitty.com
Incidentally, if a platform with 32-bit "int" defines 1<<31 as -1 (likely
on platforms that also handle C++, as C++ would require such behavior),
then if "x" is "long long", then a compiler would likely have to generate
code for x*(1<<y) that was equivalent to (y!=31) ? x<<y : -(x<<y), which
would almost certainly be less efficient than x<<y.
No, 1<<31 has undefined behavior in C++ if 2147483648 cannot be
represented as an int. (C++11 standard, section 5.8 [expr.shift],
unchanged in N4618.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Keith Thompson
2017-04-28 19:24:04 UTC
Permalink
Post by Keith Thompson
[...]
Post by s***@casperkitty.com
Incidentally, if a platform with 32-bit "int" defines 1<<31 as -1 (likely
on platforms that also handle C++, as C++ would require such behavior),
then if "x" is "long long", then a compiler would likely have to generate
code for x*(1<<y) that was equivalent to (y!=31) ? x<<y : -(x<<y), which
would almost certainly be less efficient than x<<y.
No, 1<<31 has undefined behavior in C++ if 2147483648 cannot be
represented as an int. (C++11 standard, section 5.8 [expr.shift],
unchanged in N4618.)
I was mistaken on this point. N4618, which is a draft of C++17, makes
1<<31 equivalent to (int)2147483648U (for 32-bit int and unsigned int).
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Ben Bacarisse
2017-04-28 23:03:13 UTC
Permalink
Post by Keith Thompson
Post by Keith Thompson
[...]
Post by s***@casperkitty.com
Incidentally, if a platform with 32-bit "int" defines 1<<31 as -1 (likely
on platforms that also handle C++, as C++ would require such behavior),
then if "x" is "long long", then a compiler would likely have to generate
code for x*(1<<y) that was equivalent to (y!=31) ? x<<y : -(x<<y), which
would almost certainly be less efficient than x<<y.
No, 1<<31 has undefined behavior in C++ if 2147483648 cannot be
represented as an int. (C++11 standard, section 5.8 [expr.shift],
unchanged in N4618.)
I was mistaken on this point. N4618, which is a draft of C++17, makes
1<<31 equivalent to (int)2147483648U (for 32-bit int and unsigned int).
Harmonising C and C++ was proposed in DR463. The response was

"This is not a defect.
The committee will track this and consider it for the next revision of
the standard."
--
Ben.
s***@casperkitty.com
2017-04-28 23:37:46 UTC
Permalink
Post by Ben Bacarisse
Harmonising C and C++ was proposed in DR463. The response was
"This is not a defect.
The committee will track this and consider it for the next revision of
the standard."
There are sound reasons why certain C [or for that matter C++]
implementations should be allowed to evaluate the expression in a way
other than as described in the present C++ Standard. There are also
sound reasons why implementations should not be required to reject
such code in templates.

Both needs could be accommodated by saying that an implementation must
define macros to indicate or validate whether it treats certain constructs
as yielding particular defined behaviors. An implementation would be free
to report that 1<<31 will yield a defined value, and allow that expression
within template expansions, or to report that it will not yield a defined
value and not consider any template expansions which would try to evaluate
it.

If an implementation specifies that e.g. x<<32 might yield either 0 or
x, chosen in Unspecified fashion, the C++ Standard might need to specify
whether an implementation would be allowed or required to reject templates
wherein the expressions 0<<32 or (x>>0)|(x<<32) appear. In the former case,
the fact that the left operand is zero would make it obvious that yielding
0 and yielding x are equivalent. In the latter case, the expression would
yield the same value regardless of whether x<<32 yielded 0 or x, but it would
be a bit less obvious. I think allowing implementations to accept or reject
such templates is probably the best approach (since in most cases where such
ambiguities would arise, either choice would probably work just fine) but
can imagine that it might be necessary to ensure consistency within a given
implementation. That level of detail would really only be important for C++,
however, rather than for C.
David Kleinecke
2017-04-29 00:09:46 UTC
Permalink
Post by s***@casperkitty.com
Post by Ben Bacarisse
Harmonising C and C++ was proposed in DR463. The response was
"This is not a defect.
The committee will track this and consider it for the next revision of
the standard."
There are sound reasons why certain C [or for that matter C++]
implementations should be allowed to evaluate the expression in a way
other than as described in the present C++ Standard. There are also
sound reasons why implementations should not be required to reject
such code in templates.
Both needs could be accommodated by saying that an implementation must
define macros to indicate or validate whether it treats certain constructs
as yielding particular defined behaviors. An implementation would be free
to report that 1<<31 will yield a defined value, and allow that expression
within template expansions, or to report that it will not yield a defined
value and not consider any template expansions which would try to evaluate
it.
If an implementation specifies that e.g. x<<32 might yield either 0 or
x, chosen in Unspecified fashion, the C++ Standard might need to specify
whether an implementation would be allowed or required to reject templates
wherein the expressions 0<<32 or (x>>0)|(x<<32) appear. In the former case,
the fact that the left operand is zero would make it obvious that yielding
0 and yielding x are equivalent. In the latter case, the expression would
yield the same value regardless of whether x<<32 yielded 0 or x, but it would
be a bit less obvious. I think allowing implementations to accept or reject
such templates is probably the best approach (since in most cases where such
ambiguities would arise, either choice would probably work just fine) but
can imagine that it might be necessary to ensure consistency within a given
implementation. That level of detail would really only be important for C++,
however, rather than for C.
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away. But that must have been too easy when the
standard was developed.

PS: And even x<<(-n) as x/(2^n) and so on.
s***@casperkitty.com
2017-04-29 00:34:40 UTC
Permalink
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away. But that must have been too easy when the
standard was developed.
If there exists a ones'-complement platform whose fastest way of
shifting left shifts in zeroes, and another where the fastest
shift operation would copy the sign bit, it would be logical for
implementations on each platform to, by default, do the shift in
whichever way the platform can process most efficiently, and I
doubt the authors of C89 would have wanted to compel them to do
otherwise. Note, however, that shift-right of negative numbers
should definitely not be equivalent to using the "/" operator,
since in two's-complement form, ~8>>1 [i.e. -9] should be ~4
[i.e. -5], not ~3 [i.e. -4].
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
On many platforms, that would be expensive.
Ben Bacarisse
2017-04-29 00:43:10 UTC
Permalink
David Kleinecke <***@gmail.com> writes:
<snip>
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away.
For the cases that are well-defined, that is exactly how they are
defined. But that would *cause* problem were that extended to the other
cases. The committee wanted to avoid imposing a burden on
implementations for hardware that did not match C's definition of a
shift.
Post by David Kleinecke
But that must have been too easy when the standard was developed.
That's implausible. The core meaning is defined exactly as you say, but
the extra bits are not there because the committee never thought to
remove them.
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
It is not in the spirit of C to impose that sort of cost on a primitive
operation like a bit shift.
--
Ben.
David Kleinecke
2017-04-29 04:21:40 UTC
Permalink
Post by David Brown
<snip>
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away.
For the cases that are well-defined, that is exactly how they are
defined. But that would *cause* problem were that extended to the other
cases. The committee wanted to avoid imposing a burden on
implementations for hardware that did not match C's definition of a
shift.
Post by David Kleinecke
But that must have been too easy when the standard was developed.
That's implausible. The core meaning is defined exactly as you say, but
the extra bits are not there because the committee never thought to
remove them.
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
It is not in the spirit of C to impose that sort of cost on a primitive
operation like a bit shift.
I admit I was thinking about computers which work on atoms
of information other than 2 -like quantum computers seem to
be headed for. If information is not expressed in bits it is
hard to say what all those edge cases "should" do.

Or I suppose you could claim that C was never intended for
such hardware. If so - why not?
j***@gmail.com
2017-04-29 09:04:19 UTC
Permalink
Post by David Kleinecke
Post by David Brown
<snip>
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away.
For the cases that are well-defined, that is exactly how they are
defined. But that would *cause* problem were that extended to the other
cases. The committee wanted to avoid imposing a burden on
implementations for hardware that did not match C's definition of a
shift.
Post by David Kleinecke
But that must have been too easy when the standard was developed.
That's implausible. The core meaning is defined exactly as you say, but
the extra bits are not there because the committee never thought to
remove them.
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
It is not in the spirit of C to impose that sort of cost on a primitive
operation like a bit shift.
I admit I was thinking about computers which work on atoms
of information other than 2 -like quantum computers seem to
be headed for. If information is not expressed in bits it is
hard to say what all those edge cases "should" do.
Or I suppose you could claim that C was never intended for
such hardware. If so - why not?
Unless something has changed since the last time I read the
standard (more than two back, apparently), ternary and base
four were intended to be covered.

It seems like there was at least a draft that even took up C
for a native decimal CPU.

That's probably where the conflating shift and multiplying
by a power of two came from.

--
Joel Rees
Ben Bacarisse
2017-04-29 10:46:09 UTC
Permalink
***@gmail.com writes:
<snip>
Post by j***@gmail.com
Unless something has changed since the last time I read the
standard (more than two back, apparently), ternary and base
four were intended to be covered.
A C implementation must behave as if its integer types are binary.
Provided that's done correctly, almost any base could be used by the
hardware so there's no need to single out ternary and base 4.

In fact, I don't recall any version of the C standard giving special
permission for bases 3 and 4. What parts of the language are you
referring to?

<snip>
--
Ben.
Robert Wessel
2017-04-30 17:05:05 UTC
Permalink
Post by j***@gmail.com
Post by David Kleinecke
Post by David Brown
<snip>
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away.
For the cases that are well-defined, that is exactly how they are
defined. But that would *cause* problem were that extended to the other
cases. The committee wanted to avoid imposing a burden on
implementations for hardware that did not match C's definition of a
shift.
Post by David Kleinecke
But that must have been too easy when the standard was developed.
That's implausible. The core meaning is defined exactly as you say, but
the extra bits are not there because the committee never thought to
remove them.
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
It is not in the spirit of C to impose that sort of cost on a primitive
operation like a bit shift.
I admit I was thinking about computers which work on atoms
of information other than 2 -like quantum computers seem to
be headed for. If information is not expressed in bits it is
hard to say what all those edge cases "should" do.
Or I suppose you could claim that C was never intended for
such hardware. If so - why not?
Unless something has changed since the last time I read the
standard (more than two back, apparently), ternary and base
four were intended to be covered.
It seems like there was at least a draft that even took up C
for a native decimal CPU.
That's probably where the conflating shift and multiplying
by a power of two came from.
As of C99, the implementation must act as if integer types were
binary, with signed integers being ones' complement, two's complement
or sign-magnitude. Padding bits and such are still allowed.
Ben Bacarisse
2017-04-29 10:33:15 UTC
Permalink
Post by David Kleinecke
Post by David Brown
<snip>
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away.
For the cases that are well-defined, that is exactly how they are
defined. But that would *cause* problem were that extended to the other
cases. The committee wanted to avoid imposing a burden on
implementations for hardware that did not match C's definition of a
shift.
Post by David Kleinecke
But that must have been too easy when the standard was developed.
That's implausible. The core meaning is defined exactly as you say, but
the extra bits are not there because the committee never thought to
remove them.
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
It is not in the spirit of C to impose that sort of cost on a primitive
operation like a bit shift.
I admit I was thinking about computers which work on atoms
of information other than 2 -like quantum computers seem to
be headed for. If information is not expressed in bits it is
hard to say what all those edge cases "should" do.
Your 2017 criticism of the C standard from the late 1980s was based on
what *might* be the future of quantum computers?
Post by David Kleinecke
Or I suppose you could claim that C was never intended for
such hardware. If so - why not?
Obviously it was never intended for such machines and I can't believe
you really need me to say why!
--
Ben.
David Kleinecke
2017-04-29 15:26:00 UTC
Permalink
Post by Ben Bacarisse
Post by David Kleinecke
Post by David Brown
<snip>
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away.
For the cases that are well-defined, that is exactly how they are
defined. But that would *cause* problem were that extended to the other
cases. The committee wanted to avoid imposing a burden on
implementations for hardware that did not match C's definition of a
shift.
Post by David Kleinecke
But that must have been too easy when the standard was developed.
That's implausible. The core meaning is defined exactly as you say, but
the extra bits are not there because the committee never thought to
remove them.
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
It is not in the spirit of C to impose that sort of cost on a primitive
operation like a bit shift.
I admit I was thinking about computers which work on atoms
of information other than 2 -like quantum computers seem to
be headed for. If information is not expressed in bits it is
hard to say what all those edge cases "should" do.
Your 2017 criticism of the C standard from the late 1980s was based on
what *might* be the future of quantum computers?
Post by David Kleinecke
Or I suppose you could claim that C was never intended for
such hardware. If so - why not?
Obviously it was never intended for such machines and I can't believe
you really need me to say why!
It wasn't 2017 - that's this year. I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
David Brown
2017-04-29 15:58:05 UTC
Permalink
Post by David Kleinecke
It wasn't 2017 - that's this year. I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
Do you know anything about quantum computers, how they work, what they
can be used for, and how they are different from "classical" computers?
I can't say I am an expert on quantum computing, but all that I have
read about them suggests that C would be a totally inappropriate way to
program them.

If you have any reason for supposing that C /would/ be appropriate from
programming them, I'd like to hear about it.


I am also highly sceptical that true quantum computers will ever make
economic sense. That does not mean that some people will not make a
great deal of money from making them and selling them - merely that I do
not foresee a time when it is cheaper to do a given calculation on
quantum computers rather than classical ones.

Still, it's good that time, effort and money is used on these - it's all
good physics research.

(The current commercial so-called "quantum computers" from D-Wave are
not true quantum computers as such, and while D-Wave makes good money
from selling them, they don't offer any speed advantages to customers.)
David Kleinecke
2017-04-29 16:13:05 UTC
Permalink
Post by David Brown
Post by David Kleinecke
It wasn't 2017 - that's this year. I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
Do you know anything about quantum computers, how they work, what they
can be used for, and how they are different from "classical" computers?
I can't say I am an expert on quantum computing, but all that I have
read about them suggests that C would be a totally inappropriate way to
program them.
If you have any reason for supposing that C /would/ be appropriate from
programming them, I'd like to hear about it.
I am also highly sceptical that true quantum computers will ever make
economic sense. That does not mean that some people will not make a
great deal of money from making them and selling them - merely that I do
not foresee a time when it is cheaper to do a given calculation on
quantum computers rather than classical ones.
Still, it's good that time, effort and money is used on these - it's all
good physics research.
(The current commercial so-called "quantum computers" from D-Wave are
not true quantum computers as such, and while D-Wave makes good money
from selling them, they don't offer any speed advantages to customers.)
I haven't seen any reason to doubt that C would be appropriate -
in the sense I use C (see my other post). Could you describe a
scenario where it fails?
David Brown
2017-04-30 10:55:43 UTC
Permalink
Post by David Kleinecke
Post by David Brown
Post by David Kleinecke
It wasn't 2017 - that's this year. I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
Do you know anything about quantum computers, how they work, what they
can be used for, and how they are different from "classical" computers?
I can't say I am an expert on quantum computing, but all that I have
read about them suggests that C would be a totally inappropriate way to
program them.
If you have any reason for supposing that C /would/ be appropriate from
programming them, I'd like to hear about it.
I am also highly sceptical that true quantum computers will ever make
economic sense. That does not mean that some people will not make a
great deal of money from making them and selling them - merely that I do
not foresee a time when it is cheaper to do a given calculation on
quantum computers rather than classical ones.
Still, it's good that time, effort and money is used on these - it's all
good physics research.
(The current commercial so-called "quantum computers" from D-Wave are
not true quantum computers as such, and while D-Wave makes good money
from selling them, they don't offer any speed advantages to customers.)
I haven't seen any reason to doubt that C would be appropriate -
in the sense I use C (see my other post). Could you describe a
scenario where it fails?
Quantum computing uses a completely different computation model than
classical computing. There are many different computation models -
neural networks, quantum computing, genetic computing, cellular
automation, etc. They are all able to compute the same functions, but
there can be wide variations in their efficiency for different types of
algorithm (but as far as I know, the P/NP split is the same for all).

Quantum computers are not, at the moment, "programmed" as such - they
are /built/ to execute a particular algorithm. They are more akin to
the analogue computers of the 1950's - except they are of far less use.

Based on your post, you seem to be under the impression that the
difference between a quantum computer and a classical one is that the
quantum computer has "qubits" that hold several normal bits of
information in one lump - and that otherwise, it is much like a normal
computer. You think that a quantum computer might hold 4 bits of binary
information per qubit - that it's "bits" hold values between 0 and 15
rather than between 0 and 1. And that other than that, you can perform
arithmetic, logic, data movement, program jumps, and everything else on
them, just that the smallest lump of data you handle is 4 bits at a time?

This is totally and completely wrong. In a quantum computer, a qubit
holds a superposition of an unlimited number of states - it is only the
resolution and accuracy of the system feeding data into and getting data
out of the qubit that limits the useful number of states, combined with
the stability of the qubit over time. You can't load and store qubit
values, but you can pass them through fixed operations. The result is
the "sum" of all possible combinations of all states of the qubits going
into the operation. When you take two qubits that are "holding" 4 bits
of state, and you add them, the resulting qubit contains the 256
different results of adding all combinations of two 4 bit values.
/That/ is the key to being able to scale difficult algorithms like prime
factorisation. But if your IO hardware can only distinguish 16
different discrete logic states, then of those 256 theoretical output
states you can only read out 16 of them - and figuring out /which/ 16
you get is extremely difficult. And /that/ is key to why quantum
computers are of very little practical use, and will continue to be so
for the foreseeable future (IMHO).
David Kleinecke
2017-04-30 16:17:42 UTC
Permalink
Post by David Brown
Post by David Kleinecke
Post by David Brown
Post by David Kleinecke
It wasn't 2017 - that's this year. I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
Do you know anything about quantum computers, how they work, what they
can be used for, and how they are different from "classical" computers?
I can't say I am an expert on quantum computing, but all that I have
read about them suggests that C would be a totally inappropriate way to
program them.
If you have any reason for supposing that C /would/ be appropriate from
programming them, I'd like to hear about it.
I am also highly sceptical that true quantum computers will ever make
economic sense. That does not mean that some people will not make a
great deal of money from making them and selling them - merely that I do
not foresee a time when it is cheaper to do a given calculation on
quantum computers rather than classical ones.
Still, it's good that time, effort and money is used on these - it's all
good physics research.
(The current commercial so-called "quantum computers" from D-Wave are
not true quantum computers as such, and while D-Wave makes good money
from selling them, they don't offer any speed advantages to customers.)
I haven't seen any reason to doubt that C would be appropriate -
in the sense I use C (see my other post). Could you describe a
scenario where it fails?
Quantum computing uses a completely different computation model than
classical computing. There are many different computation models -
neural networks, quantum computing, genetic computing, cellular
automation, etc. They are all able to compute the same functions, but
there can be wide variations in their efficiency for different types of
algorithm (but as far as I know, the P/NP split is the same for all).
Quantum computers are not, at the moment, "programmed" as such - they
are /built/ to execute a particular algorithm. They are more akin to
the analogue computers of the 1950's - except they are of far less use.
Based on your post, you seem to be under the impression that the
difference between a quantum computer and a classical one is that the
quantum computer has "qubits" that hold several normal bits of
information in one lump - and that otherwise, it is much like a normal
computer. You think that a quantum computer might hold 4 bits of binary
information per qubit - that it's "bits" hold values between 0 and 15
rather than between 0 and 1. And that other than that, you can perform
arithmetic, logic, data movement, program jumps, and everything else on
them, just that the smallest lump of data you handle is 4 bits at a time?
This is totally and completely wrong. In a quantum computer, a qubit
holds a superposition of an unlimited number of states - it is only the
resolution and accuracy of the system feeding data into and getting data
out of the qubit that limits the useful number of states, combined with
the stability of the qubit over time. You can't load and store qubit
values, but you can pass them through fixed operations. The result is
the "sum" of all possible combinations of all states of the qubits going
into the operation. When you take two qubits that are "holding" 4 bits
of state, and you add them, the resulting qubit contains the 256
different results of adding all combinations of two 4 bit values.
/That/ is the key to being able to scale difficult algorithms like prime
factorisation. But if your IO hardware can only distinguish 16
different discrete logic states, then of those 256 theoretical output
states you can only read out 16 of them - and figuring out /which/ 16
you get is extremely difficult. And /that/ is key to why quantum
computers are of very little practical use, and will continue to be so
for the foreseeable future (IMHO).
I don't consider current quantum computers serious
implementations of quantum possibilities. I remain
looking into the rather distant future and trying to
guess where quantum computing will end up. As I see
it there is still a non-trivial probability that
quantum computing will never be useful but enough
likelihood that it will that we should plan for it.
David Brown
2017-04-30 18:35:31 UTC
Permalink
Post by David Kleinecke
Post by David Brown
Post by David Kleinecke
Post by David Brown
Post by David Kleinecke
It wasn't 2017 - that's this year. I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
Do you know anything about quantum computers, how they work, what they
can be used for, and how they are different from "classical" computers?
I can't say I am an expert on quantum computing, but all that I have
read about them suggests that C would be a totally inappropriate way to
program them.
If you have any reason for supposing that C /would/ be appropriate from
programming them, I'd like to hear about it.
I am also highly sceptical that true quantum computers will ever make
economic sense. That does not mean that some people will not make a
great deal of money from making them and selling them - merely that I do
not foresee a time when it is cheaper to do a given calculation on
quantum computers rather than classical ones.
Still, it's good that time, effort and money is used on these - it's all
good physics research.
(The current commercial so-called "quantum computers" from D-Wave are
not true quantum computers as such, and while D-Wave makes good money
from selling them, they don't offer any speed advantages to customers.)
I haven't seen any reason to doubt that C would be appropriate -
in the sense I use C (see my other post). Could you describe a
scenario where it fails?
Quantum computing uses a completely different computation model than
classical computing. There are many different computation models -
neural networks, quantum computing, genetic computing, cellular
automation, etc. They are all able to compute the same functions, but
there can be wide variations in their efficiency for different types of
algorithm (but as far as I know, the P/NP split is the same for all).
Quantum computers are not, at the moment, "programmed" as such - they
are /built/ to execute a particular algorithm. They are more akin to
the analogue computers of the 1950's - except they are of far less use.
Based on your post, you seem to be under the impression that the
difference between a quantum computer and a classical one is that the
quantum computer has "qubits" that hold several normal bits of
information in one lump - and that otherwise, it is much like a normal
computer. You think that a quantum computer might hold 4 bits of binary
information per qubit - that it's "bits" hold values between 0 and 15
rather than between 0 and 1. And that other than that, you can perform
arithmetic, logic, data movement, program jumps, and everything else on
them, just that the smallest lump of data you handle is 4 bits at a time?
This is totally and completely wrong. In a quantum computer, a qubit
holds a superposition of an unlimited number of states - it is only the
resolution and accuracy of the system feeding data into and getting data
out of the qubit that limits the useful number of states, combined with
the stability of the qubit over time. You can't load and store qubit
values, but you can pass them through fixed operations. The result is
the "sum" of all possible combinations of all states of the qubits going
into the operation. When you take two qubits that are "holding" 4 bits
of state, and you add them, the resulting qubit contains the 256
different results of adding all combinations of two 4 bit values.
/That/ is the key to being able to scale difficult algorithms like prime
factorisation. But if your IO hardware can only distinguish 16
different discrete logic states, then of those 256 theoretical output
states you can only read out 16 of them - and figuring out /which/ 16
you get is extremely difficult. And /that/ is key to why quantum
computers are of very little practical use, and will continue to be so
for the foreseeable future (IMHO).
I don't consider current quantum computers serious
implementations of quantum possibilities. I remain
looking into the rather distant future and trying to
guess where quantum computing will end up.
Do you know what the current state of quantum computing is? Did I
provide a fair summary of what you think it is? Did my last paragraph
help you understand it? Can you give any kind of description about what
you imagine quantum computing will look like in the future (near or
distant)? You have said a number of times that you see no reason why C
should not be a suitable language for programming quantum computers - to
make that statement, you must have a solid basis for how you think they
will work.
Post by David Kleinecke
As I see
it there is still a non-trivial probability that
quantum computing will never be useful but enough
likelihood that it will that we should plan for it.
In what way should we plan for it?

(If you feel you have dug yourself into a hole here, by speculating
about things that are beyond your knowledge and making claims based on
"gut feelings" rather than justifiable and informed opinion, then feel
free to back out. I know from personal experience that it can be easier
to keep going with more speculation - but it is better to stop digging
before things get silly.)
David Kleinecke
2017-04-30 22:39:06 UTC
Permalink
Post by David Brown
Post by David Kleinecke
Post by David Brown
Post by David Kleinecke
Post by David Brown
Post by David Kleinecke
It wasn't 2017 - that's this year. I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
Do you know anything about quantum computers, how they work, what they
can be used for, and how they are different from "classical" computers?
I can't say I am an expert on quantum computing, but all that I have
read about them suggests that C would be a totally inappropriate way to
program them.
If you have any reason for supposing that C /would/ be appropriate from
programming them, I'd like to hear about it.
I am also highly sceptical that true quantum computers will ever make
economic sense. That does not mean that some people will not make a
great deal of money from making them and selling them - merely that I do
not foresee a time when it is cheaper to do a given calculation on
quantum computers rather than classical ones.
Still, it's good that time, effort and money is used on these - it's all
good physics research.
(The current commercial so-called "quantum computers" from D-Wave are
not true quantum computers as such, and while D-Wave makes good money
from selling them, they don't offer any speed advantages to customers.)
I haven't seen any reason to doubt that C would be appropriate -
in the sense I use C (see my other post). Could you describe a
scenario where it fails?
Quantum computing uses a completely different computation model than
classical computing. There are many different computation models -
neural networks, quantum computing, genetic computing, cellular
automation, etc. They are all able to compute the same functions, but
there can be wide variations in their efficiency for different types of
algorithm (but as far as I know, the P/NP split is the same for all).
Quantum computers are not, at the moment, "programmed" as such - they
are /built/ to execute a particular algorithm. They are more akin to
the analogue computers of the 1950's - except they are of far less use.
Based on your post, you seem to be under the impression that the
difference between a quantum computer and a classical one is that the
quantum computer has "qubits" that hold several normal bits of
information in one lump - and that otherwise, it is much like a normal
computer. You think that a quantum computer might hold 4 bits of binary
information per qubit - that it's "bits" hold values between 0 and 15
rather than between 0 and 1. And that other than that, you can perform
arithmetic, logic, data movement, program jumps, and everything else on
them, just that the smallest lump of data you handle is 4 bits at a time?
This is totally and completely wrong. In a quantum computer, a qubit
holds a superposition of an unlimited number of states - it is only the
resolution and accuracy of the system feeding data into and getting data
out of the qubit that limits the useful number of states, combined with
the stability of the qubit over time. You can't load and store qubit
values, but you can pass them through fixed operations. The result is
the "sum" of all possible combinations of all states of the qubits going
into the operation. When you take two qubits that are "holding" 4 bits
of state, and you add them, the resulting qubit contains the 256
different results of adding all combinations of two 4 bit values.
/That/ is the key to being able to scale difficult algorithms like prime
factorisation. But if your IO hardware can only distinguish 16
different discrete logic states, then of those 256 theoretical output
states you can only read out 16 of them - and figuring out /which/ 16
you get is extremely difficult. And /that/ is key to why quantum
computers are of very little practical use, and will continue to be so
for the foreseeable future (IMHO).
I don't consider current quantum computers serious
implementations of quantum possibilities. I remain
looking into the rather distant future and trying to
guess where quantum computing will end up.
Do you know what the current state of quantum computing is? Did I
provide a fair summary of what you think it is? Did my last paragraph
help you understand it? Can you give any kind of description about what
you imagine quantum computing will look like in the future (near or
distant)? You have said a number of times that you see no reason why C
should not be a suitable language for programming quantum computers - to
make that statement, you must have a solid basis for how you think they
will work.
Post by David Kleinecke
As I see
it there is still a non-trivial probability that
quantum computing will never be useful but enough
likelihood that it will that we should plan for it.
In what way should we plan for it?
(If you feel you have dug yourself into a hole here, by speculating
about things that are beyond your knowledge and making claims based on
"gut feelings" rather than justifiable and informed opinion, then feel
free to back out. I know from personal experience that it can be easier
to keep going with more speculation - but it is better to stop digging
before things get silly.)
I haven't spent much time on this speculation - and I
don't plan to. I am sort of going to back out because
I have nothing more to say.

By planning for it I mean we should avoid locking in things
that might cripple it. Like assuming information atoms must be
binary.
David Brown
2017-05-01 14:21:18 UTC
Permalink
Post by David Kleinecke
I haven't spent much time on this speculation - and I
don't plan to. I am sort of going to back out because
I have nothing more to say.
Fair enough - I will just make a brief comment on the last part.
Post by David Kleinecke
By planning for it I mean we should avoid locking in things
that might cripple it. Like assuming information atoms must be
binary.
Some programming languages make assumptions that the underlying hardware
operates in binary - but for most, it is sufficient that you can work
with some form of integers with particular ranges. C, for example, does
/not/ require that the basic data format be in binary - but it /does/
require that you can read and write all data in blocks that support a
fixed power of two different values. It does not matter if an "unsigned
char" is 8 binary bits, 27 binary bits, 6 base-4 bitpairs, 3 base 16
nibbles, or whatever - as long as it can represent a set of values from
0 to 2^n - 1 for a fixed n >= 8, with no padding or missing codes.

There are some programming languages that work well with packed decimal
coding of some sort. However, very early in the days of programmable
digital computers it became obvious that binary was far and away the
most practical choice of base, and we have stuck to that.

However, the ability to work directly with integer-based numbers in a
deterministic manner is key to the way most programming languages work.
To deal with different computing models - neural nets, cellular
automata, quantum computers, etc., - you need very, very different types
of programming languages. It does not "cripple" a programming language
for classical computers if it is limited to classical computers -
anything that works well on classical computers cannot possibly work
well on quantum computers. Your suggestion is akin to saying that car
designers should not cripple their designs by locking them to driving on
roads and the ground, when in the future we might want to travel
underwater more often.
Malcolm McLean
2017-05-02 13:35:40 UTC
Permalink
Post by David Brown
Based on your post, you seem to be under the impression that the
difference between a quantum computer and a classical one is that the
quantum computer has "qubits" that hold several normal bits of
information in one lump - and that otherwise, it is much like a normal
computer. You think that a quantum computer might hold 4 bits of binary
information per qubit - that it's "bits" hold values between 0 and 15
rather than between 0 and 1. And that other than that, you can perform
arithmetic, logic, data movement, program jumps, and everything else on
them, just that the smallest lump of data you handle is 4 bits at a time?
This is totally and completely wrong. In a quantum computer, a qubit
holds a superposition of an unlimited number of states - it is only the
resolution and accuracy of the system feeding data into and getting data
out of the qubit that limits the useful number of states, combined with
the stability of the qubit over time.
I had a look at quantum mechanics as part of my PhD, though we decided in
pretty short order not to go there.

My understanding is that the whole point is that it is quantised - the number
of allowed states is finite, though very large, and the system is in all of the
state simultaneously. So a qbit would have two states, on and off and a one
qbit quantum computer is exactly the same as a 1 bit conventional computer.
A 2 qbit quantum computer, however, evaluates 0 0, 0 1, 1 0, and 1 1 simultaneously,
so it's four times as powerful as the equivalent binary computer.
David Brown
2017-05-02 14:29:28 UTC
Permalink
Post by Malcolm McLean
Post by David Brown
Based on your post, you seem to be under the impression that the
difference between a quantum computer and a classical one is that the
quantum computer has "qubits" that hold several normal bits of
information in one lump - and that otherwise, it is much like a normal
computer. You think that a quantum computer might hold 4 bits of binary
information per qubit - that it's "bits" hold values between 0 and 15
rather than between 0 and 1. And that other than that, you can perform
arithmetic, logic, data movement, program jumps, and everything else on
them, just that the smallest lump of data you handle is 4 bits at a time?
This is totally and completely wrong. In a quantum computer, a qubit
holds a superposition of an unlimited number of states - it is only the
resolution and accuracy of the system feeding data into and getting data
out of the qubit that limits the useful number of states, combined with
the stability of the qubit over time.
I had a look at quantum mechanics as part of my PhD, though we decided in
pretty short order not to go there.
My understanding is that the whole point is that it is quantised - the number
of allowed states is finite, though very large, and the system is in all of the
state simultaneously. So a qbit would have two states, on and off and a one
qbit quantum computer is exactly the same as a 1 bit conventional computer.
A 2 qbit quantum computer, however, evaluates 0 0, 0 1, 1 0, and 1 1 simultaneously,
so it's four times as powerful as the equivalent binary computer.
It is not that simple. (For any explanation in quantum "stuff", no
matter how complex the explanation, reality is /never/ as simple as the
explanation.)

In some aspects of the quantum world, you are correct - things happen in
discrete steps. Energy levels come in lumps, charge is always an
integer (except for quarks, where it is always an integer multiple of
1/3), and so on.

In other aspects, things that would be fixed or have a single defining
value in the non-quantum world now have waves, probability functions,
and other types of "value". You no longer have known positions or
speeds for particles - you have probability density functions.

A qubit does not have a finite number of states - it has an infinite
number. It's "pure qubit state" is a linear combination of a "0" and a
"1", where the two scaling factors are complex numbers with the sum of
the squares of their amplitudes being 1 (since it is a probability).
This is already very, very far away from a single digital bit.

And then the qubit can be in a mixed state of different "pure" states -
and things really start to get complicated (and far beyond /my/ meagre
understanding).
Ben Bacarisse
2017-04-29 19:58:42 UTC
Permalink
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Post by David Brown
<snip>
Post by David Kleinecke
One could define x<<n as x*(2^n) and x>>n as x/(2^n) and any
problems go away.
For the cases that are well-defined, that is exactly how they are
defined. But that would *cause* problem were that extended to the other
cases. The committee wanted to avoid imposing a burden on
implementations for hardware that did not match C's definition of a
shift.
Post by David Kleinecke
But that must have been too easy when the standard was developed.
That's implausible. The core meaning is defined exactly as you say, but
the extra bits are not there because the committee never thought to
remove them.
Post by David Kleinecke
PS: And even x<<(-n) as x/(2^n) and so on.
It is not in the spirit of C to impose that sort of cost on a primitive
operation like a bit shift.
I admit I was thinking about computers which work on atoms
of information other than 2 -like quantum computers seem to
be headed for. If information is not expressed in bits it is
hard to say what all those edge cases "should" do.
Your 2017 criticism of the C standard from the late 1980s was based on
what *might* be the future of quantum computers?
Post by David Kleinecke
Or I suppose you could claim that C was never intended for
such hardware. If so - why not?
Obviously it was never intended for such machines and I can't believe
you really need me to say why!
It wasn't 2017 - that's this year.
"It" -- your criticism of the C standardisation process -- *was* this
year.
Post by David Kleinecke
I am interested in the long
range future of computing. I see no reason why C (as I understand
it - not necessarily the standard) couldn't be used to program
quantum computers. Regardless of what it was intended for back
then.
I was interested in what you said in the message I replied to. If you
have nothing more to say about that, I'm happy to leave it at that.
--
Ben.
David Brown
2017-04-28 16:43:09 UTC
Permalink
Post by s***@casperkitty.com
Post by David Brown
I used to work with compilers that did little in the way of
optimisation, and would do this kind of "hand optimisation". Such tools
were usually for small microcontrollers which may not have things like a
multiply instruction - "(x << 2) + x" could be significantly faster than
"x * 5". Almost certainly, "x" would have been an unsigned type in such
cases - usually uint8_t (even if I had to define that type myself
because the compiler had no <stdint.h>).
For multiplication by a constant, I generally prefer to use multiplication
operator since compilers can pretty consistently figure that out.
So do I - with decent compilers. But I was talking here about compilers
that did little in the way of optimisation, including converting
multiplication by a constant into shifts and/or adds.
Post by s***@casperkitty.com
If the
scale factor will always be a power of two, but *which* power of two may
vary, and if there's no need to even think about running the code on any
machine that doesn't use two's-complement math, I would regard x<<y as
being clearer than x*(1<<y). Note that the former has a define case where
the latter doesn't, and on implementations which don't go out of their way
not to expose useful platform behavior it doesn't require knowing the type
of y.
For which case(s) is (x << y) defined but (x * (1u << y)) is not? Do
you mean when x is long long (or unsigned long long) ?
Post by s***@casperkitty.com
Incidentally, if a platform with 32-bit "int" defines 1<<31 as -1 (likely
on platforms that also handle C++, as C++ would require such behavior),
That would be a very odd choice of definition for 1 << 31. In hex, it
is 0x8000'0000, which is -2147483648 in two's complement 32-bit signed
arithmetic.

And as far as I can tell, C++ has exactly the same behaviour for shifts
as C, including exactly the same undefined and implementation dependent
behaviour.
Post by s***@casperkitty.com
then if "x" is "long long", then a compiler would likely have to generate
code for x*(1<<y) that was equivalent to (y!=31) ? x<<y : -(x<<y), which
would almost certainly be less efficient than x<<y.
bartc
2017-04-28 13:09:06 UTC
Permalink
Post by Keith Thompson
Post by David Brown
Post by Keith Thompson
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
The behavor of left-shifting a negative value is undefined.
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
I don't remember ever wanting to shift a signed integer, left or right,
except perhaps an integer constant (such as "1 << 12") without bothering
to add the "u" suffix. Bit manipulation, such as shifts and logical bit
operations, are usually most natural on unsigned types in my experience.
It can be tempting to use "<<" as a replacement for multiplying by
a power of 2. For example, one might write `n << 3` rather than
`n * 8`. But it's UB if n is negative -- and it's usually better
to write `n * 8` and let the compiler figure out whether it makes
sense to use a shift instruction.
With gcc (on my x86 machine), it uses left shift to multiply a signed
int by 8, one that it doesn't know whether it's negative or not.

It knows that left-shifting a negative int will work. I think it will
always work on two's complement (ie. give the same result as multiply),
and many will spend their entire working lives working with such
machines. Yet they are not allowed to assume that. Ever.

As for UB on left-shifting 3 places, isn't this the same UB you get when
multiplying by 8 and the result might overflow? In that case it doesn't
seem to matter which is used.

(BTW why didn't the language just prohibit shifts on signed values if it
frowns upon them so much?)
--
bartc
Scott Lurndal
2017-04-28 14:23:21 UTC
Permalink
Post by bartc
With gcc (on my x86 machine), it uses left shift to multiply a signed
int by 8, one that it doesn't know whether it's negative or not.
It knows that left-shifting a negative int will work.
No, it knows that the result is undefined (because of signed
underflow, in this case).

$ cat /tmp/a.c
int main(int argc, char **argv)
{
int fred = -2031524053;
int b = fred * 8;

printf ("%d\n", b);

return b;
}

0000000000400530 <main>:
400530: 55 push %rbp
400531: 48 89 e5 mov %rsp,%rbp
400534: 48 83 ec 20 sub $0x20,%rsp
400538: 89 7d ec mov %edi,-0x14(%rbp)
40053b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
40053f: c7 45 fc 2b 67 e9 86 movl $0x86e9672b,-0x4(%rbp)
400546: 8b 45 fc mov -0x4(%rbp),%eax
==> 400549: c1 e0 03 shl $0x3,%eax
40054c: 89 45 f8 mov %eax,-0x8(%rbp)
40054f: 8b 45 f8 mov -0x8(%rbp),%eax
400552: 89 c6 mov %eax,%esi
400554: bf 00 06 40 00 mov $0x400600,%edi
400559: b8 00 00 00 00 mov $0x0,%eax
40055e: e8 ad fe ff ff callq 400410 <***@plt>
400563: 8b 45 f8 mov -0x8(%rbp),%eax
400566: c9 leaveq
400567: c3 retq
400568: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40056f: 00

$ /tmp/a
927676760
bartc
2017-04-28 15:17:01 UTC
Permalink
Post by Scott Lurndal
Post by bartc
With gcc (on my x86 machine), it uses left shift to multiply a signed
int by 8, one that it doesn't know whether it's negative or not.
It knows that left-shifting a negative int will work.
No, it knows that the result is undefined (because of signed
underflow, in this case).
$ cat /tmp/a.c
int main(int argc, char **argv)
{
int fred = -2031524053;
int b = fred * 8;
What are you saying, that ANY multiply of signed int is undefined?

With my test, the equivalent of your 'fred' was initialised from an
external function, and gcc still used a left-shift.
--
bartc
Scott Lurndal
2017-04-28 15:55:36 UTC
Permalink
Post by bartc
Post by Scott Lurndal
Post by bartc
With gcc (on my x86 machine), it uses left shift to multiply a signed
int by 8, one that it doesn't know whether it's negative or not.
It knows that left-shifting a negative int will work.
No, it knows that the result is undefined (because of signed
underflow, in this case).
$ cat /tmp/a.c
int main(int argc, char **argv)
{
int fred = -2031524053;
int b = fred * 8;
What are you saying, that ANY multiply of signed int is undefined?
I think I was pretty clear in the part of the article that
you snipped out. Any signed operations that can result in a signed underflow
or overflow result in undefined behavior. This allows the compiler to
use a left shift as a peephole optimization for multiplication
by powers of two without needing to worry about the fact that
the result may be not be mathematically correct.

The _application programmer_ is responsible for ensuring that the
multiplicand and the multiplier don't result in
overflow/underflow when using signed integer arithmetic.
Malcolm McLean
2017-04-29 02:11:46 UTC
Permalink
Post by bartc
What are you saying, that ANY multiply of signed int is undefined?
Let's say we're load an image. width and height are 32 bit quantities.
We multiply them together and call malloc to create pixel buffer, then
loop over height and width to populate it from the file.
If int is 32 bits, some malicious person can contrive a file such that
the multiplication overflows, our buffer is too small, and we have a
security leak. Unless we have an environment where an overflow leads
to termination of the program with an error message.
Ben Bacarisse
2017-04-29 10:23:36 UTC
Permalink
Post by Malcolm McLean
Post by bartc
What are you saying, that ANY multiply of signed int is undefined?
Let's say we're load an image. width and height are 32 bit quantities.
We multiply them together and call malloc to create pixel buffer, then
loop over height and width to populate it from the file.
If int is 32 bits, some malicious person can contrive a file such that
the multiplication overflows, our buffer is too small, and we have a
security leak. Unless we have an environment where an overflow leads
to termination of the program with an error message.
I'm not sure what your point is. It is always the responsibility of the
programmer to ensure the his or her code is correct. Are you just
saying that C does not help as much with that task as some higher-level
languages do?
--
Ben.
bartc
2017-04-29 10:31:20 UTC
Permalink
Post by Malcolm McLean
Post by bartc
What are you saying, that ANY multiply of signed int is undefined?
Let's say we're load an image. width and height are 32 bit quantities.
We multiply them together and call malloc to create pixel buffer, then
loop over height and width to populate it from the file.
If int is 32 bits, some malicious person can contrive a file such that
the multiplication overflows, our buffer is too small, and we have a
security leak. Unless we have an environment where an overflow leads
to termination of the program with an error message.
That would apply also when these 32 bit quantities are unsigned. As
could well be the case with things such as height and width.

It doesn't appear to have anything to do with the undefined behaviour
associated with the use of signed integers that everyone seems to be
obsessed with.

That fact that that only arises when they are used on ones complement
and signed magnitude systems (which seem to be imaginary systems that no
one ever uses) doesn't help. But *because* C has stipulated that such
operations are undefined behaviour, it means that can affect even code
that runs on the vast majority of system that are twos complement,
because compilers can take 'advantage' of that.

The problem is that people go to the trouble of avoiding UB with signed
operations, then get lulled into a false sense of security when they use
unsigned.

So (32-bit) unsigned A and B are both 1000000, A*B is 3,567,587,328
rather than 1,000,000,000,000, but no problem because it's modular
arithmetic, and its behaviour is well defined. Wrong, but at least it's
not UB. It will be predictably wrong on all systems.
--
Bartc
Ben Bacarisse
2017-04-29 11:01:09 UTC
Permalink
bartc <***@freeuk.com> writes:
<snip>
Post by bartc
So (32-bit) unsigned A and B are both 1000000, A*B is 3,567,587,328
rather than 1,000,000,000,000, but no problem because it's modular
arithmetic, and its behaviour is well defined. Wrong, but at least
it's not UB. It will be predictably wrong on all systems.
Why is it wrong? I like languages like Haskell where 10^6 * 10^6 ==
10^12 but that comes at a price I doubt you'd consider acceptable for
C. Slightly cheaper (though still unacceptable to most C programmers)
would be to insist that all such operations raise an exception.
The cheapest would be to have made unsigned overflow undefined too, but,
again, I doubt you'd want *more* UB in C. The current definition of
unsigned arithmetic is cheap to implement, easy to understand and very
useful. That's not what I'd call "wrong".
--
Ben.
bartc
2017-04-29 11:47:57 UTC
Permalink
Post by David Brown
<snip>
Post by bartc
So (32-bit) unsigned A and B are both 1000000, A*B is 3,567,587,328
rather than 1,000,000,000,000, but no problem because it's modular
arithmetic, and its behaviour is well defined. Wrong, but at least
it's not UB. It will be predictably wrong on all systems.
Why is it wrong? I like languages like Haskell where 10^6 * 10^6 ==
10^12 but that comes at a price I doubt you'd consider acceptable for
C. Slightly cheaper (though still unacceptable to most C programmers)
would be to insist that all such operations raise an exception.
The cheapest would be to have made unsigned overflow undefined too, but,
again, I doubt you'd want *more* UB in C. The current definition of
unsigned arithmetic is cheap to implement, easy to understand and very
useful. That's not what I'd call "wrong".
It's wrong because it gives the wrong answer, and is likely an error in
the program if it was intended to be mathematically correct.

My point is that the emphasis seems to be on avoiding undefined
behaviour rather than getting correct behaviour.

I would call that erroneous result above 'overflow', but C suggests that
that is not possible. The answer is still wrong however if A and B
represent, say, width and height of some rectangle, and you want to work
out its area.

(Solutions needn't be expensive when a wider type is available, eg. 64
bits when A and B are 32.)
--
bartc
David Brown
2017-04-29 13:20:17 UTC
Permalink
Post by bartc
Post by David Brown
<snip>
Post by bartc
So (32-bit) unsigned A and B are both 1000000, A*B is 3,567,587,328
rather than 1,000,000,000,000, but no problem because it's modular
arithmetic, and its behaviour is well defined. Wrong, but at least
it's not UB. It will be predictably wrong on all systems.
Why is it wrong? I like languages like Haskell where 10^6 * 10^6 ==
10^12 but that comes at a price I doubt you'd consider acceptable for
C. Slightly cheaper (though still unacceptable to most C programmers)
would be to insist that all such operations raise an exception.
The cheapest would be to have made unsigned overflow undefined too, but,
again, I doubt you'd want *more* UB in C. The current definition of
unsigned arithmetic is cheap to implement, easy to understand and very
useful. That's not what I'd call "wrong".
It's wrong because it gives the wrong answer, and is likely an error in
the program if it was intended to be mathematically correct.
What would, in your opinion, be the "right" answer - given the
constraint that the value has to be stored in a 32-bit integer (signed
or unsigned - you pick) ?

Overflows are always going to be "wrong" in some way for some uses -
there is no possibility of making them entirely correct. You cannot
store 10^12 in a 32-bit value. So if you, like in Haskell or Python,
extend the value to hold 10^12 for correct mathematics, you are now
"wrong" regarding the size of the object and its efficiency in coding.

If you are designing a programming language, you make decisions about
how these are handled - and different languages take different approaches.

In C, unsigned arithmetic is done as modulo - so the result of
multiplying two 32-bit unsigned values 1000000 /is/ 3,567,587,328. It
is not wrong, it is correct - mathematically correct. It is just a
different mathematical operation that you might first expect.

For signed arithmetic, C takes the stance that the many different ways
of handling it are wrong in too many cases, and inefficient to implement
in at least some target architectures - so it makes it undefined behaviour.
Post by bartc
My point is that the emphasis seems to be on avoiding undefined
behaviour rather than getting correct behaviour.
I would call that erroneous result above 'overflow', but C suggests that
that is not possible. The answer is still wrong however if A and B
represent, say, width and height of some rectangle, and you want to work
out its area.
(Solutions needn't be expensive when a wider type is available, eg. 64
bits when A and B are 32.)
bartc
2017-04-29 14:22:51 UTC
Permalink
Post by David Brown
Post by bartc
It's wrong because it gives the wrong answer, and is likely an error in
the program if it was intended to be mathematically correct.
What would, in your opinion, be the "right" answer - given the
constraint that the value has to be stored in a 32-bit integer (signed
or unsigned - you pick) ?
There is no right answer. But that's the same whether signed or
unsigned. It might go wrong less often with unsigned because it has one
bit of extra range. But that might be the only reason why something is
unsigned, because it can represent bigger values. Nothing to do with
avoiding UB with signed or wanting modular arithmetic with unsigned.

It's a pure numeric problem that ought to be the same with any language
(one that doesn't auto-scale to arbitrary precision):

You have a numeric type that has a range of roughly 0..N or -N/2..N/2,
and have to consider what might happen if an operation on A and B, both
within such a range, has a mathematical result (ie. what you'd get doing
the same op on a Casio) outside the range.
--
bartc
David Brown
2017-04-29 14:52:33 UTC
Permalink
Post by bartc
Post by David Brown
Post by bartc
It's wrong because it gives the wrong answer, and is likely an error in
the program if it was intended to be mathematically correct.
What would, in your opinion, be the "right" answer - given the
constraint that the value has to be stored in a 32-bit integer (signed
or unsigned - you pick) ?
There is no right answer. But that's the same whether signed or
unsigned. It might go wrong less often with unsigned because it has one
bit of extra range. But that might be the only reason why something is
unsigned, because it can represent bigger values. Nothing to do with
avoiding UB with signed or wanting modular arithmetic with unsigned.
It's a pure numeric problem that ought to be the same with any language
You have a numeric type that has a range of roughly 0..N or -N/2..N/2,
and have to consider what might happen if an operation on A and B, both
within such a range, has a mathematical result (ie. what you'd get doing
the same op on a Casio) outside the range.
Why should it be the same in any language? It is /not/ a "purely
numeric problem" - precisely because there is no "purely numeric"
correct answer.

For some languages, the style and philosophy of the language means that
any such overflows should result in an exception or other run-time error
handling. For some languages, the style and philosophy of the language
means that the arithmetic should be handled as quickly and efficiently
as possible, with the issues surrounding overflow being ignored. And
for some languages, auto-scaling (or perhaps just using a fixed size but
very large integer type) is the only suitable choice.
Ben Bacarisse
2017-04-29 14:18:20 UTC
Permalink
Post by bartc
Post by David Brown
<snip>
Post by bartc
So (32-bit) unsigned A and B are both 1000000, A*B is 3,567,587,328
rather than 1,000,000,000,000, but no problem because it's modular
arithmetic, and its behaviour is well defined. Wrong, but at least
it's not UB. It will be predictably wrong on all systems.
Why is it wrong? I like languages like Haskell where 10^6 * 10^6 ==
10^12 but that comes at a price I doubt you'd consider acceptable for
C. Slightly cheaper (though still unacceptable to most C programmers)
would be to insist that all such operations raise an exception.
The cheapest would be to have made unsigned overflow undefined too, but,
again, I doubt you'd want *more* UB in C. The current definition of
unsigned arithmetic is cheap to implement, easy to understand and very
useful. That's not what I'd call "wrong".
It's wrong because it gives the wrong answer, and is likely an error
in the program if it was intended to be mathematically correct.
That's not an interesting kind of wrong -- it depends on what the intent
was (you didn't say) and on a bunch of definitions. The interesting
meaning (which was the point I thought you were making) is that some of
C's arithmetic choices were the wrong ones.
Post by bartc
My point is that the emphasis seems to be on avoiding undefined
behaviour rather than getting correct behaviour.
I would call that erroneous result above 'overflow', but C suggests
that that is not possible. The answer is still wrong however if A and
B represent, say, width and height of some rectangle, and you want to
work out its area.
(Solutions needn't be expensive when a wider type is available, eg. 64
bits when A and B are 32.)
But that was not available when C was being defined, and by the time 64
bit types where mandated, too much code was written with the old
semantics. And, of course, that does not solve the problem but simply
pushes it to another type. Provided you mean "it's not what I want
nowadays" then I accept that C's choices are "wrong". But if you mean
more that that, what should the rules have been (around 1980) to avoid
people on Usenet declaring it wrong 35 years later?
--
Ben.
Jean-Marc Bourguet
2017-04-29 12:25:41 UTC
Permalink
Post by Ben Bacarisse
The cheapest would be to have made unsigned overflow undefined too
IIRC DMR papers on C history, unsigned was introduced to get the wrap
around behavior (which was available on pointers; pointers behaved at that
time of C evolution as a mix of pointers and unsigned now).

Yours,
--
Jean-Marc
Jerry Stuckle
2017-04-29 12:05:27 UTC
Permalink
Post by Malcolm McLean
Post by bartc
What are you saying, that ANY multiply of signed int is undefined?
Let's say we're load an image. width and height are 32 bit quantities.
We multiply them together and call malloc to create pixel buffer, then
loop over height and width to populate it from the file.
If int is 32 bits, some malicious person can contrive a file such that
the multiplication overflows, our buffer is too small, and we have a
security leak. Unless we have an environment where an overflow leads
to termination of the program with an error message.
If you properly verify the size is within the appropriate limits, and
reject images which are too large, this can't happen.

This is exactly why there are so many successful hacks. Too many
programmers don't validate input. NEVER trust input from a user.
Verify EVERYTHING.
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
Gareth Owen
2017-04-29 12:34:15 UTC
Permalink
Too many programmers don't validate input. NEVER trust input from a
user. Verify EVERYTHING.
This should be printed in Bold Capitals in the frontspiece of every
introductory programming book (and most advanced ones too).
Malcolm McLean
2017-04-29 12:55:19 UTC
Permalink
Post by Jerry Stuckle
Post by Malcolm McLean
Post by bartc
What are you saying, that ANY multiply of signed int is undefined?
Let's say we're load an image. width and height are 32 bit quantities.
We multiply them together and call malloc to create pixel buffer, then
loop over height and width to populate it from the file.
If int is 32 bits, some malicious person can contrive a file such that
the multiplication overflows, our buffer is too small, and we have a
security leak. Unless we have an environment where an overflow leads
to termination of the program with an error message.
If you properly verify the size is within the appropriate limits, and
reject images which are too large, this can't happen.
This is exactly why there are so many successful hacks. Too many
programmers don't validate input. NEVER trust input from a user.
Verify EVERYTHING.
It's a bit harder than it looks.

int width = get32(file);
int height = get32(file);
int buffsize = width * height; // uh oh. UB here, can't do that

if( (double) width * (double) height < sanity) // this is a way, but it's
fiddly and what is "sanity for an image? Not all images are intended for human viewing on a monitor screen.
GOTHIER Nathan
2017-04-29 13:11:09 UTC
Permalink
On Sat, 29 Apr 2017 05:55:19 -0700 (PDT)
Post by Malcolm McLean
It's a bit harder than it looks.
int width = get32(file);
int height = get32(file);
int buffsize = width * height; // uh oh. UB here, can't do that
if( (double) width * (double) height < sanity) // this is a way, but it's
fiddly and what is "sanity for an image? Not all images are intended for human
viewing on a monitor screen.

sanity should be less or equal to INT_MAX / (sizeof (pixel_t)).
David Brown
2017-04-29 13:24:39 UTC
Permalink
Post by Malcolm McLean
Post by Jerry Stuckle
Post by Malcolm McLean
Post by bartc
What are you saying, that ANY multiply of signed int is undefined?
Let's say we're load an image. width and height are 32 bit quantities.
We multiply them together and call malloc to create pixel buffer, then
loop over height and width to populate it from the file.
If int is 32 bits, some malicious person can contrive a file such that
the multiplication overflows, our buffer is too small, and we have a
security leak. Unless we have an environment where an overflow leads
to termination of the program with an error message.
If you properly verify the size is within the appropriate limits, and
reject images which are too large, this can't happen.
This is exactly why there are so many successful hacks. Too many
programmers don't validate input. NEVER trust input from a user.
Verify EVERYTHING.
It's a bit harder than it looks.
int width = get32(file);
int height = get32(file);
int buffsize = width * height; // uh oh. UB here, can't do that
if( (double) width * (double) height < sanity) // this is a way, but it's
fiddly and what is "sanity for an image? Not all images are intended for human viewing on a monitor screen.
1. It is /not/ hard - it is very simple, and C provides a number of
different methods for doing this safely. Use long longs - problem solved.

2. It is also simple to use application-specific limitations. You can
quite reasonably say that the application can't handle widths or heights
greater than 32767 - problem solved. (Certainly that would be fine on
any platform that does not have 64-bit types.)

3. If you really find a situation where validating the input is hard,
then you do the hard work and write the code anyway. "Hard" is not an
excuse for being lazy or wrong.
Gareth Owen
2017-04-29 13:57:52 UTC
Permalink
Post by Malcolm McLean
int buffsize = width * height; // uh oh. UB here, can't do that
If only * had some sort of inverse operation we could use to check
whether width*height > INT_MAX

Lets pretend theres an operator called \ that does
inverse-multiplication. Then we could write

if(width <= INT_MAX \ height) {
// No UB!!
}

But without that operator, we're definitely stuck with casting
everything to doubles.
Keith Thompson
2017-04-28 15:55:10 UTC
Permalink
[...]
Post by bartc
Post by Keith Thompson
It can be tempting to use "<<" as a replacement for multiplying by
a power of 2. For example, one might write `n << 3` rather than
`n * 8`. But it's UB if n is negative -- and it's usually better
to write `n * 8` and let the compiler figure out whether it makes
sense to use a shift instruction.
With gcc (on my x86 machine), it uses left shift to multiply a signed
int by 8, one that it doesn't know whether it's negative or not.
It knows that left-shifting a negative int will work. I think it will
always work on two's complement (ie. give the same result as multiply),
and many will spend their entire working lives working with such
machines. Yet they are not allowed to assume that. Ever.
You're allowed to assume anything you like -- but you should be aware
that your assumptions might not be universally valid, even if your CPU
uses 2's-complement.
Post by bartc
As for UB on left-shifting 3 places, isn't this the same UB you get when
multiplying by 8 and the result might overflow? In that case it doesn't
seem to matter which is used.
No. (-3 * 8) is well defined. (-3 << 8) has undefined behavior.
Post by bartc
(BTW why didn't the language just prohibit shifts on signed values if it
frowns upon them so much?)
Shifts on *non-negative* signed values are well defined as long as the
result is representable in the type. Prohibiting shifts on signed
values would make (1 << 3) invalid (you'd have to write (1U << 3)).
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
j***@gmail.com
2017-04-28 03:20:01 UTC
Permalink
Post by Keith Thompson
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
Not sure how.
Post by Keith Thompson
The behavor of left-shifting a negative value is undefined.
Without context, I suppose it is.

But mapping it to a logical shift left on the CPU, whatever the CPU gives for that, seems reasonable to me.

Far more reasonable than assuming that casting a minus one to an unsigned type should always result in a bit mask of the type's width.
Post by Keith Thompson
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
Have you even left-shifted an unsigned char expecting specifically for the upper bits to disappear, instead of pulling the unsigned char into and unsigned long to do the shift and then masking the upper bits back out with a cast back to unsigned char?

And hoping the compiler (which often misses such optimizations) will optimize out your excessive code on CPUs that will allocate a double short register to do the long shift?
Post by Keith Thompson
The behavior of converting an integer (either signed or unsigned) value
to an unsigned type is well defined; the result is adjusted modulo
MAX+1, where MAX is the maximum value of the unsigned type. This is
described in N1570 6.3.1.3p2.
(N1570 is the latest publicly available draft of the ISO C standard,
available at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf .)
[...]
Post by j***@gmail.com
Unsigned char promoted to signed int?
Yes. The promotion rules are in N1570 6.3.1.1p2. The idea is that
there are no arithmetic operations on types narrower than int or
unsigned int, so anything narrower must be "promoted" before you can
operate on it.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is converted to
an int; otherwise, it is converted to an unsigned int. These are
called the *integer promotions*.
The 1989 ANSI C committee had to choose between two schemes, both
implemented by existing compilers at the time: "unsigned preserving" and
"value preserving". They chose the latter. (I think I would have
preferred unsigned preserving semantics myself, but I haven't taken the
time to think about possible disadvantages.) This is discussed in the
C99 Rationale, section 6.3.1.1, available at
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf .
If I had the time, I'd go read it. I'm sure they thought they had a good reason.

I'll still push for splitting the standard if anyone is willing to discuss it.

Embedded has drastically different needs than desktop or server, and I even think desktop and server might be well served by fine-tuning the standard to the target.

--
Joel Rees

"Is the standard made for man? Or is man made for the standard?"

"Grasshopper. Go bother some other guru for a while."
j***@gmail.com
2017-04-28 03:26:47 UTC
Permalink
(erk)
Post by j***@gmail.com
[...]
Have you even left-shifted an unsigned char expecting specifically
erm, "ever"

(my bad. Sorry.)
Post by j***@gmail.com
for the upper bits to disappear, instead of pulling the unsigned
char into and unsigned long to do the shift and then masking the
upper bits back out with a cast back to unsigned char?
[...]
--
Joel Rees
"Is the standard made for man? Or is man made for the standard?"
"Grasshopper. Go bother some other guru for a while."
David Brown
2017-04-28 08:19:06 UTC
Permalink
Post by j***@gmail.com
Post by Keith Thompson
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on
a negative operand is specified on sign-magnitude architectures
or something, but we have this assurance that casting a -1 to an
unsigned fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
Not sure how.
Post by Keith Thompson
The behavor of left-shifting a negative value is undefined.
Without context, I suppose it is.
The context here is C programming unless explicitly stated otherwise,
and in that context left-shifting a negative value is undefined.
Post by j***@gmail.com
But mapping it to a logical shift left on the CPU, whatever the CPU
gives for that, seems reasonable to me.
Logical shifts work naturally on /unsigned/ types.
Post by j***@gmail.com
Far more reasonable than assuming that casting a minus one to an
unsigned type should always result in a bit mask of the type's
width.
That is perfectly reasonable for modulo arithmetic on unsigned types.

There are a variety of ways one could define arithmetic and overflow for
signed and unsigned types. Different methods have their advantages and
disadvantages, and different programming languages pick different methods.

In C, unsigned arithmetic is always modulo, and so overflows wrap
around. That fits well with most hardware, and has a lot of practical
benefits. But it also means that "u < u + 1" might be false, which is
counter-intuitive for normal mathematics. You can take an unsigned
number, keep adding to it, and suddenly end up with nothing.

Signed arithmetic, on the other hand, never overflows in C. It is
undefined behaviour - as far as the language is concerned, it can't
happen. That means "x < x + 1" is /always/ true as far as C is
concerned, for signed type x. This sort of thing makes mathematical
sense, and gives compilers freedom to re-arrange and optimise code in a
way that makes sense to programmers. You may think "I don't write
conditionals like that" - but it can be surprising how often simple
mathematical manipulation of expressions are valid when overflow cannot
happen, but invalid in two's complement wraparound signed arithmetic.
(An example is "(x * 2) / 2" -> "x".)

Certain other behaviour, such as the conversion of an unsigned integer
to a signed integer when the value won't fit, are left as implementation
defined behaviour - basically, one should expect the most "natural"
behaviour on the hardware, and this should be documented by the
compiler. In many cases, this conversion will be modulo wraparound -
but compilers /may/ do something else, if that operation would be expensive.

Would it be nice to have different behaviour of C integers? Perhaps -
it is certainly a valid viewpoint. Personally, I am happy with the way
they work in most cases, but there are a few corner cases that I don't
like. I would prefer automatic promotions of unsigned char and unsigned
short to be to unsigned int, rather than signed int. I would like
certain implementation-defined behaviour, such as the conversion to a
signed integer mentioned above, to be "conditionally defined behaviour"
- if the hardware uses two's complement signed integers, then such
conversions should be defined as modulo wraparound, otherwise they can
be implementation dependent.

But whatever you think about the rules, however you think they could be
different, however you imagine they /were/ different in some old version
of C or some old compilers - the rules are there, they are clear and
consistent, and there should not be a problem learning them and using them.
Post by j***@gmail.com
Post by Keith Thompson
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a
negative integer.)
Have you even left-shifted an unsigned char expecting specifically
for the upper bits to disappear, instead of pulling the unsigned char
into and unsigned long to do the shift and then masking the upper
bits back out with a cast back to unsigned char?
Unsigned chars promote to int (unless they are already the same size as
int, in which case they work like unsigned int). An int can always
represent all the values of the unsigned char as positive values, and
left shifting of up to 7 bits is /always/ going to work correctly.
Converting back to an unsigned char will always mask off any extra bits.
So if you have code like:

unsigned char uc = ...;
int n = ...;
uc = uc << n;

Then the code will work as expected unless "n" shifts out /all/ the bits
in the unsigned char (in which case it might work, and might not).
Post by j***@gmail.com
And hoping the compiler (which often misses such optimizations) will
optimize out your excessive code on CPUs that will allocate a double
short register to do the long shift?
Most compilers will handle code like the above quite well.

<snip>
Post by j***@gmail.com
If I had the time, I'd go read it. I'm sure they thought they had a good reason.
I'll still push for splitting the standard if anyone is willing to discuss it.
You want to split and change the standard - but you haven't read it or
the reasoning behind it? I'd study the rationale and the latest
standard first, before trying to split it. (I don't think a rationale
for C11 has been published.)
Post by j***@gmail.com
Embedded has drastically different needs than desktop or server, and
I even think desktop and server might be well served by fine-tuning
the standard to the target.
I work with embedded programming all the time - I almost never use C on
desktops or servers (usually I use Python for such work). And the more
I learn about the subtleties of C - often from this group - the more I
realise that there is very, very little I would want to change in this
sort of area. There are a few points in the standard that would prefer
to be modified. There are a number of gcc extensions or
implementation-defined behaviours that I think are of significant
benefit to my coding - and would therefore be useful in all compilers.
There are a few (not many) points in the standard that I think could be
simplified by assuming the target is a "normal" modern processor. There
are a few changes or additions I would like in the standard library.
And there are more than a few places where the standard could benefit
from clearer language or better examples.

But the fundamentals - including the way signed and unsigned work, and
the undefined behaviours, I am happy with.
Post by j***@gmail.com
-- Joel Rees
"Is the standard made for man? Or is man made for the standard?"
"Grasshopper. Go bother some other guru for a while."
James Kuyper
2017-04-28 13:57:16 UTC
Permalink
On 04/28/2017 04:19 AM, David Brown wrote:
...
Post by David Brown
Unsigned chars promote to int (unless they are already the same size as
int, in which case they work like unsigned int).
Size isn't the relevant characteristic, though it is highly correlated
with it. If UCHAR_MAX > INT_MAX, unsigned char promotes to unsigned int
rather than int.
Post by David Brown
An int can always
represent all the values of the unsigned char as positive values,
Change "always" to "usually", or at least "almost always". A fully
conforming implementation can have UCHAR_MAX > INT_MAX. This is
nit-picking, since such systems are rare, possibly non-existent - but
you invite nit-picking when you insist on using absolutes such as "always".
Post by David Brown
... and
left shifting of up to 7 bits is /always/ going to work correctly.
Unless CHAR_BIT > 8, which happens on many real-world systems.
Post by David Brown
Converting back to an unsigned char will always mask off any extra bits.
That, at least, is true.
David Brown
2017-04-28 15:06:06 UTC
Permalink
Post by James Kuyper
...
Post by David Brown
Unsigned chars promote to int (unless they are already the same size as
int, in which case they work like unsigned int).
Size isn't the relevant characteristic, though it is highly correlated
with it. If UCHAR_MAX > INT_MAX, unsigned char promotes to unsigned int
rather than int.
Post by David Brown
An int can always
represent all the values of the unsigned char as positive values,
Change "always" to "usually", or at least "almost always". A fully
conforming implementation can have UCHAR_MAX > INT_MAX. This is
nit-picking, since such systems are rare, possibly non-existent - but
you invite nit-picking when you insist on using absolutes such as "always".
Nit-picking is good - at least in this kind of discussion. So I am very
happy to be corrected.

I did mean to write "always" - but I intended that part to refer only to
cases where unsigned chars promote to signed int rather than unsigned
int (because if they are promoted to unsigned int, then the shift is
done as unsigned and is fully defined with modulo wraparound semantics).

I know of (and have used) systems where "char" and "int" are the same
bit-size - a couple of 16-bit DSP devices and (briefly) a 32-bit DSP
where int, short and char were all 16-bit or 32-bit respectively.
Unsigned chars promote to unsigned int on such systems.
Post by James Kuyper
Post by David Brown
... and
left shifting of up to 7 bits is /always/ going to work correctly.
Unless CHAR_BIT > 8, which happens on many real-world systems.
Let me see if I've got my reasoning correct here. (And as noted above,
I am happy to be corrected if I've got something wrong. I'm splitting
up the points into short paragraphs to make it easier to point out any
mistakes.)

CHAR_BIT must be at least 8 - this is the number of bits in an unsigned
char (all of which must be value bits, not padding bits).

An unsigned int must have at least as many value bits as CHAR_BIT, and
at least 16 value bits, but can have padding bits as well. It must have
a "pure binary" representation.

A signed char must have exactly one sign bit, no padding bits, and
CHAR_BIT - 1 value bits. Positive values must have the same
representation as the same value of unsigned char.

A signed int must have at least as many value bits as a signed char, and
at least 15 value bits. It can have padding bits. Positive values must
have the same representation (excluding padding bits) as for unsigned
ints. It is possible for a signed int to have fewer value bits than an
unsigned int, but nor more value bits.

If UCHAR_MAX > INT_MAX, then there are unsigned char values that cannot
be represented as signed ints. Therefore, unsigned char promotes to
unsigned int for shifting. Since unsigned ints have defined overflow,
it is safe to left-shift them by any non-negative amount less than the
width of unsigned int. Since that width must be at least 16, it is safe
to shift by up to 15 (and therefore up to 7).

If UCHAR_MAX <= INT_MAX, unsigned char values promote to signed int (as
positive values). It is safe to left-shift this as long as there is no
overflow of the int.

To be unsafe on a shift of 7 bits, that means the int must have fewer
than 7 value bits above those of unsigned char, or a width (value bits
plus sign bit) of less than 8 bits above that of an unsigned char. It
must also have more than 0 extra bits (otherwise the unsigned char would
have been promoted to unsigned int). And because a signed int has at
least 16 bits of width, this means that we need a CHAR_BIT greater than
8, and a signed int whose representation is at least twice CHAR_BIT in
raw bit size (since it is always a multiple of a char), but whose
non-padding bit size is less than 8 bits more than that of a char. So
we are talking about something like a 9-bit char, with an int being 16
bits of data with 2 padding bits. Yes, I can agree that this is
possible by the C standards - but I don't think it will happen "on many
real-world systems".
Post by James Kuyper
Post by David Brown
Converting back to an unsigned char will always mask off any extra bits.
That, at least, is true.
s***@casperkitty.com
2017-04-28 15:38:56 UTC
Permalink
Post by David Brown
The context here is C programming unless explicitly stated otherwise,
and in that context left-shifting a negative value is undefined.
Most people write code for actual machines, and so the context should be the
language processed for use on actual machines.

If the Standard had included the following two notes:

1. The fact that the Standard would allow an implementation targeting
a particular platform to behave in a certain way does not mean that
such behavior by that implementation would not be downright stupid
or obtuse.

2. The fact that a program targeting a particular platform might
malfunction when processed by an implementation which is downright
stupid or obtuse does not imply that it is defective.

that would undermine claims that the Standard requires programmers to jump
through hoops for portability even when the target platform is known. Such
statements would have been viewed as insulting and condescending (which is
likely why no such statements appear in the Standard) but I don't think
anyone could reasonably argue that the authors of the Standard intended the
opposite.
David Brown
2017-04-28 16:51:18 UTC
Permalink
Post by s***@casperkitty.com
Post by David Brown
The context here is C programming unless explicitly stated otherwise,
and in that context left-shifting a negative value is undefined.
Most people write code for actual machines, and so the context should be the
language processed for use on actual machines.
Not in this newsgroup, no. The context is C programming - and what the
C standards say overrules what happens to be the implementation details
in some systems, precisely because implementations can vary so much.
The standards are the /only/ fixed references we have - so they are the
context unless explicitly stated otherwise.
Post by s***@casperkitty.com
1. The fact that the Standard would allow an implementation targeting
a particular platform to behave in a certain way does not mean that
such behavior by that implementation would not be downright stupid
or obtuse.
2. The fact that a program targeting a particular platform might
malfunction when processed by an implementation which is downright
stupid or obtuse does not imply that it is defective.
And if pigs could fly, you get your dog to do your coding. It is just
as silly a hypothetical situation.
Post by s***@casperkitty.com
that would undermine claims that the Standard requires programmers to jump
through hoops for portability even when the target platform is known.
Programmers /don't/ have to jump through hoops for portability when the
target platform is known. You can write C code that it is tied to a
specific target (processor, compiler, etc.) rather than portable code.
This is one of the key uses of C from its first inception, and is a key
use of it today. /Some/ code has to be maximally portable, some code
has to be portable within certain limitations, and some code does not
have to be portable at all.
Post by s***@casperkitty.com
Such
statements would have been viewed as insulting and condescending (which is
likely why no such statements appear in the Standard) but I don't think
anyone could reasonably argue that the authors of the Standard intended the
opposite.
You regularly claim to know the minds of the standards authors, and seem
to think you know what they meant but did not write. Just because you
cannot see any written evidence why might not have meant something, does
not mean that they actually meant it but for some reason didn't write it.
s***@casperkitty.com
2017-04-29 19:43:02 UTC
Permalink
Post by David Brown
The context here is C programming unless explicitly stated otherwise,
and in that context left-shifting a negative value is undefined.
Did C89 become off-topic? In C89, left-shifting a negative value was
defined in a way that made sense for two's-complement systems, but was
probably less than ideal on ones'-complement platforms (an implementation
which targeted ones' complement hardware could efficiently handle code
written for two's-complement machines if it made bitwise operators on
negative numbers behave as they would for two's-complement equivalents,
but the way C89 was written required that -1<<1 yield -3 rather than -2
on ones'-complement sytems).

It would seem likely that the authors of C99 wanted to allow implementations
to process left-shifts of negative numbers in whatever fashion would make
the most sense given the target platforms and purposes. Making the behavior
Undefined certainly has that effect. Is there any reason to believe that
it was intended to invite implementations where exactly one behavior would
make sense, to arbitrarily select other nonsensical behaviors instead?

BTW, I find it ironic that C99 went out of its way to acommodate sign-
magnitude and ones'-complement implementations, but then impose other
requirements that make it impractical to implement on such machines (are
there *any* conforming C99-or-later compilers for actual ones'-complement
hardware? Do they use 64-bit or larger word size, or do they somehow do
multi-word math in base 2)?

For that matter, does anyone know if the Univac 36-bit ones'-complement
system which has a 71-bit signed long long but no unsigned long long,
actually uses a pure binary notation for its signed long long type? I
would expect it would be much more efficient to use a type with a base of
2^36-1. If integers wrap mod that base, then given:

int v1h,v2h, resh;
nativeUnsigned v1l,v2l, resl; // Wrap mod 2^36-1 rather than mod 2^36

the sum could be computed, even without any "carry-flag" feature, via:

resultlower = v1l+v2l;
resultUpper = (v1l ^ v2l ^ resultLower) & 1 + v1h + v2h;

C99's requirement that all numbers use a pure binary representation would
forbid such an approach, however. I'm not sure exactly what hoops the
implementation had to jump through to achieve mod-2^36 unsigned arithmetic,
but I would expect it to make math on base-2^36 types much more expensive.
I don't know the Univac's instruction set, but would expect the above would
translate very straightforwardly into machine code with no branching; I
would expect code for unsigned long long to be a lot more complicated.
Tim Rentsch
2017-04-30 21:02:31 UTC
Permalink
Post by s***@casperkitty.com
Post by David Brown
The context here is C programming unless explicitly stated otherwise,
and in that context left-shifting a negative value is undefined.
Did C89 become off-topic? In C89, left-shifting a negative value was
defined in a way [...]
As he is wont to do, supercat is talking out of school here.
Before the C99 standard, a left shift of a negative value
is not defined behavior but implementation-defined behavior.
Quoting the official response in DR 81:

Subclause 6.3 states that the binary operator < , among
others, has implementation-defined aspects for signed types.
Therefore, the answer to ``What does it mean to left shift a
signed value?'' is that it is implementation-defined.
Keith Thompson
2017-04-30 21:25:04 UTC
Permalink
Post by Tim Rentsch
Post by s***@casperkitty.com
Post by David Brown
The context here is C programming unless explicitly stated otherwise,
and in that context left-shifting a negative value is undefined.
Did C89 become off-topic? In C89, left-shifting a negative value was
defined in a way [...]
As he is wont to do, supercat is talking out of school here.
Before the C99 standard, a left shift of a negative value
is not defined behavior but implementation-defined behavior.
Subclause 6.3 states that the binary operator < , among
others, has implementation-defined aspects for signed types.
Therefore, the answer to ``What does it mean to left shift a
signed value?'' is that it is implementation-defined.
It would be helpful if you'd Cite the DR you're referring to:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_081.html

(Note that the DR incorrectly refers to "<" rather than "<<",
presumably a typo.)

I'm not sure that supercat's interpretation is entirely unreasonable
given what the C90 standard actually says.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Tim Rentsch
2017-04-30 22:01:28 UTC
Permalink
Post by Keith Thompson
Post by Tim Rentsch
Post by s***@casperkitty.com
Post by David Brown
The context here is C programming unless explicitly stated otherwise,
and in that context left-shifting a negative value is undefined.
Did C89 become off-topic? In C89, left-shifting a negative value was
defined in a way [...]
As he is wont to do, supercat is talking out of school here.
Before the C99 standard, a left shift of a negative value
is not defined behavior but implementation-defined behavior.
Subclause 6.3 states that the binary operator < , among
others, has implementation-defined aspects for signed types.
Therefore, the answer to ``What does it mean to left shift a
signed value?'' is that it is implementation-defined.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_081.html
Yes, if I had had the URL handy then I might have thought to
include it. But I didn't, so I didn't.
Post by Keith Thompson
(Note that the DR incorrectly refers to "<" rather than "<<",
presumably a typo.)
Right. By now I've read these things so many times I often miss
the typos.
Post by Keith Thompson
I'm not sure that supercat's interpretation is entirely
unreasonable given what the C90 standard actually says.
It's perfectly reasonable if it had been qualified with an "I
think that ..." or "It looks like ..." or something similar.
I called it talking out of school because he offers the
interpretation as an absolute certainty, without bothering
to double check what should be an obvious red flag. He's
been told about this before and he just doesn't bother.
His screwy interpretations are wrong more often than not
(or at least that is my strong impression), and I keep hoping
he will exercise more due diligence in the future. Sorry
to say he seems to have a very thick hide, and like he said
in another posting he's a glutton for punishment. So I
don't recommend that anyone hold their breath on that one.
James Kuyper
2017-04-28 13:48:34 UTC
Permalink
...
Post by j***@gmail.com
Post by Keith Thompson
Post by j***@gmail.com
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
Not sure how.
If you avoid using such inflammatory terms, you'll find yourself
avoiding thinking them too, which will greatly improve the quality and
accuracy of your thinking.
Post by j***@gmail.com
Post by Keith Thompson
The behavor of left-shifting a negative value is undefined.
Without context, I suppose it is.
No, it's undefined behavior BECAUSE of the context, which includes the C
standard.
Post by j***@gmail.com
But mapping it to a logical shift left on the CPU, whatever the CPU
gives for that, seems reasonable to me.
Undefined behavior permits that choice, it just doesn't mandate it.
Post by j***@gmail.com
Far more reasonable than assuming that casting a minus one to an
unsigned type should always result in a bit mask of the type's width.
I'm curious why you find that unreasonable. Understand - it isn't a
special case rule for -1, it's a natural consequence of applying C's
general rule for converting signed values to unsigned types. That rule
was specifically designed so that on 2's complement machines
(overwhelmingly the most common type), conversion of an m-bit signed
integer value to an n-bit unsigned type is a no-op if m==n, and involves
merely extracting the n lowest bits when n<m. This strikes you as an odd
choice?
Post by j***@gmail.com
Post by Keith Thompson
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
Have you even left-shifted an unsigned char expecting specifically
for the upper bits to disappear,
No.
Post by j***@gmail.com
... instead of pulling the unsigned char into
and unsigned long to do the shift and then masking the upper bits back
out with a cast back to unsigned char?
I'd just mask the value directly.
Post by j***@gmail.com
And hoping the compiler (which often misses such optimizations) will
optimize out your excessive code on CPUs that will allocate a double
short register to do the long shift?
Keep in mind that if you write the code explicitly in the way you assume
is most efficient, the compiler is free to disagree with you and
generate machine instructions with equivalent behavior that corresponds
more closely to the code you didn't want to write. If you don't trust
the compiler to generate sufficiently efficient code, you shouldn't be
using C, you should be using assembler, which gives you more control.
s***@casperkitty.com
2017-04-28 15:07:49 UTC
Permalink
Post by j***@gmail.com
Post by Keith Thompson
The 1989 ANSI C committee had to choose between two schemes, both
implemented by existing compilers at the time: "unsigned preserving" and
"value preserving". They chose the latter. (I think I would have
preferred unsigned preserving semantics myself, but I haven't taken the
time to think about possible disadvantages.) This is discussed in the
C99 Rationale, section 6.3.1.1, available at
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf .
If I had the time, I'd go read it. I'm sure they thought they had a good reason.
Do read it, and you'll see that despite what other people here claim, the
authors of the Standard were expecting that even though the Standard made
allowances for platforms that performed integer arithmetic in weird ways,
they expected implementations to evolve toward handling things like

unsigned mul(unsigned short x, unsigned short y) { return x*y; }

in sensible fashion even in cases where the result is between INT_MAX+1U and
UINT_MAX.

Incidentally, the reason that (unsigned)-1 has all bits set is that the
Standard requires that UINT_MAX be one smaller than a power of two, adding
1 to UINT_MAX gives zero, and adding x to -x gives zero. Put those together
and (unsigned)-1 is required to give a number which, when added to 1, yields
zero.
David Brown
2017-04-28 16:56:27 UTC
Permalink
Post by s***@casperkitty.com
Post by j***@gmail.com
Post by Keith Thompson
The 1989 ANSI C committee had to choose between two schemes, both
implemented by existing compilers at the time: "unsigned preserving" and
"value preserving". They chose the latter. (I think I would have
preferred unsigned preserving semantics myself, but I haven't taken the
time to think about possible disadvantages.) This is discussed in the
C99 Rationale, section 6.3.1.1, available at
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf .
If I had the time, I'd go read it. I'm sure they thought they had a good reason.
Do read it, and you'll see that despite what other people here claim, the
authors of the Standard were expecting that even though the Standard made
allowances for platforms that performed integer arithmetic in weird ways,
they expected implementations to evolve toward handling things like
unsigned mul(unsigned short x, unsigned short y) { return x*y; }
in sensible fashion even in cases where the result is between INT_MAX+1U and
UINT_MAX.
Read the rationale, and you'll see there is no evidence for this claim
whatsoever.

Just because the standards committee did not write explicitly "lets add
a fun trap to catch people trying to multiply unsigned shorts" in either
the standards or the rationales, does /not/ mean that they really meant
to write "here are all the rules for integer promotions, implicit
conversions, expression evaluation, etc., all written out clearly and
consistently for a balance between compiler freedom to optimise, support
for varied target architectures, efficient code generation, and ease of
learning and ease of use for the programmer. And we'll add one special
exception just for our dear friend Supercat - if you want to multiply
two unsigned shorts, the rules are different."
Post by s***@casperkitty.com
Incidentally, the reason that (unsigned)-1 has all bits set is that the
Standard requires that UINT_MAX be one smaller than a power of two, adding
1 to UINT_MAX gives zero, and adding x to -x gives zero. Put those together
and (unsigned)-1 is required to give a number which, when added to 1, yields
zero.
Or, as has been explained in far simpler terms, unsigned arithmetic is
done as modulo arithmetic in C.
s***@casperkitty.com
2017-04-28 18:30:22 UTC
Permalink
Post by David Brown
Read the rationale, and you'll see there is no evidence for this claim
whatsoever.
Read it and draw your own conclusions.
Post by David Brown
Just because the standards committee did not write explicitly "lets add
a fun trap to catch people trying to multiply unsigned shorts" in either
the standards or the rationales, does /not/ mean that they really meant
to write "here are all the rules for integer promotions, implicit
conversions, expression evaluation, etc., all written out clearly and
consistently for a balance between compiler freedom to optimise, support
for varied target architectures, efficient code generation, and ease of
learning and ease of use for the programmer. And we'll add one special
exception just for our dear friend Supercat - if you want to multiply
two unsigned shorts, the rules are different."
The rationale notes that on the majority of implementations, singed and
unsigned operations will behave identically except in certain specific
situations. If the authors of the Standard didn't mean to imply that
they would exhibit such behavior *even in cases where the Standard would
not require it* there would have been no need to limit such discussion
merely to those implementations that use silent-wraparound two's-complement
semantics.
Keith Thompson
2017-04-28 15:35:29 UTC
Permalink
Post by j***@gmail.com
Post by Keith Thompson
[...]
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
(I know calling it brain-damaged is too extreme, but ...)
I suggest you might benefit from describing these things in less
inflammatory terms.
Not sure how.
Well, avoiding terms like "brain-damaged" and "purism" would be a good
start.
Post by j***@gmail.com
Post by Keith Thompson
The behavor of left-shifting a negative value is undefined.
Without context, I suppose it is.
What context? The behavior is undefined. The C standard says so.
Of course an implementation can choose to define the behavior
for itself.
Post by j***@gmail.com
But mapping it to a logical shift left on the CPU, whatever the CPU
gives for that, seems reasonable to me.
Sure, that's a reasonable way to implement the "<<" operator, assuming
that the CPU's shift operator implements the required semantics.

But the C standard specifies how programs behave, not what CPU
instructions are used to implement that behavior. And compilers can
and do take advantage of undefined behavior to perform optimizations.

If you want a language where a shift operator means "apply the CPU's
logical shift instruction", you want something other than C.
Post by j***@gmail.com
Far more reasonable than assuming that casting a minus one to an
unsigned type should always result in a bit mask of the type's width.
You don't have to assume it. It follows from the semantics specified by
the C standard, and it's guaranteed.
Post by j***@gmail.com
Post by Keith Thompson
This is specified in N1570 6.5.7p4.
(I don't recall that I've ever felt the need to left-shift a negative
integer.)
Have you even left-shifted an unsigned char expecting specifically for
the upper bits to disappear, instead of pulling the unsigned char into
and unsigned long to do the shift and then masking the upper bits back
out with a cast back to unsigned char?
Not that I recall. If I needed to shift an unsigned char, I'd probably
cast it to unsigned int to avoid problems. (I'm not particularly happy
that that's needed.)

[...]
Post by j***@gmail.com
Post by Keith Thompson
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf .
If I had the time, I'd go read it. I'm sure they thought they had a good reason.
I'll still push for splitting the standard if anyone is willing to discuss it.
I'd say that reading and understanding the standard (and the Rationale)
is a prerequisite to making major changes.

[...]

Incidentally, it would be very helpful if you'd format your posts to
about 72 columns. Google Groups makes it unreasonably difficult to do
so.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2017-04-27 22:54:04 UTC
Permalink
(Please get a real newsreader and server, rather than google groups.)
<snip>
Post by j***@gmail.com
You know, it's funny.
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an
unsigned fills the thing with ones.
Left shift of negative integers is like most signed integer overflows -
the result is undefined (but an implementation can define it if it wants
to).
Post by j***@gmail.com
(I know calling it brain-damaged is too extreme, but ...)
And I have a hard time remembering that assurance is in the standard.
Maybe it's because I don't really use sign-magnitude that much.
Basically, unsigned arithmetic always works as modulo arithmetic.
Signed arithmetic is not allowed to overflow. It's actually quite
simple and consistent.
Post by j***@gmail.com
The standards committee gets to tilt at windmills, and I get to chase their rainbows.
I really wish the standards committee had instead specified a means
of compile-time testing the architecture for the specific corner- and
edge-cases and providing a means to either pop up a warning to remind
the reader of compiler output that the code wasn't intended to be
compiled on that particular architecture, or use conditional compiled
sections to deal with the corner-case on those CPUs.
There isn't really anything to test at compile time here. Don't let
your signed arithmetic overflow - it does not matter if the underlying
hardware is sign-magnitude, two's complement, or whatever.
Post by j***@gmail.com
A few more hours of their work to help a thousand guys like me to cut
a few wasted days off of tilting with our own windmills every now and
again.
Post by Ben Bacarisse
Post by j***@gmail.com
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
What's wrong with
ulBytes = sizeof (unsigned long);
?
Well, since you ask, you're right. Either way requires run-time behavior.
sizeof(unsigned long) is fixed at compile time, and even half-decent
compilers will optimise accordingly.
j***@gmail.com
2017-04-28 04:05:31 UTC
Permalink
Post by David Brown
(Please get a real newsreader and server, rather than google groups.)
You got a favorite?

I'm using googlemail because google insists on fighting with sylpheed, but googlecode sure drags the browser through the mud and heats the engines up mercilessly.
Post by David Brown
<snip>
Post by j***@gmail.com
You know, it's funny.
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an
unsigned fills the thing with ones.
Left shift of negative integers is like most signed integer overflows -
the result is undefined (but an implementation can define it if it wants
to).
When you assume that's an overflow and not a truncation, that seems
correct.

I was, for various reasons, looking for the truncation, and now I
see that the standard says I should say, what was it? (Dragging
it in from my source code instead of looking back to the previous
post.) Instead of:

chMask = ( chMask << 1 ) | 1;

I should say

chMask = (unsigned char)
( ( ( (unsigned long) chMask ) << 1 ) | 1 );
Post by David Brown
Post by j***@gmail.com
(I know calling it brain-damaged is too extreme, but ...)
And I have a hard time remembering that assurance is in the standard.
Maybe it's because I don't really use sign-magnitude that much.
Basically, unsigned arithmetic always works as modulo arithmetic.
Signed arithmetic is not allowed to overflow. It's actually quite
simple and consistent.
If you know that unsigned char gets promoted to signed int, as Keith Thompson pointed out to me. Somehow, I missed that all those years ago.
Post by David Brown
Post by j***@gmail.com
The standards committee gets to tilt at windmills, and I get to chase their rainbows.
I really wish the standards committee had instead specified a means
of compile-time testing the architecture for the specific corner- and
edge-cases and providing a means to either pop up a warning to remind
the reader of compiler output that the code wasn't intended to be
compiled on that particular architecture, or use conditional compiled
sections to deal with the corner-case on those CPUs.
There isn't really anything to test at compile time here. Don't let
your signed arithmetic overflow - it does not matter if the underlying
hardware is sign-magnitude, two's complement, or whatever.
Well, you seem to be talking about the shift when I'm talking about
the idiom by that point.

On a sign-magnitude architecture, I think I'd prefer, instead of the
idiom, that

(unsigned char) -1

hmm. What would I prefer?

signed char thing = -1;
unsigned char uthing = (unsigned long) thing;

should give me, depending on the sign-magnitude implementation,
something like 0x81 in uthing.

Is thing promoted before the cast?
Post by David Brown
Post by j***@gmail.com
A few more hours of their work to help a thousand guys like me to cut
a few wasted days off of tilting with our own windmills every now and
again.
Post by Ben Bacarisse
Post by j***@gmail.com
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
What's wrong with
ulBytes = sizeof (unsigned long);
?
Well, since you ask, you're right. Either way requires run-time behavior.
sizeof(unsigned long) is fixed at compile time, and even half-decent
compilers will optimise accordingly.
But the last time I tried to use sizeof in a context that required
the expression to evaluate to a constant at compile time, the
compiler said I was being a naughty boy and would get no pudding that
day.

I ended up using a bunch of ifdefs to pick the right constant.

The code I'm working on this time should not have that constraint,
although I'm not sure if I want to take the time to go fix it.

Calling my initialization function fixed my problem for now.
It's not exactly critical code, but it's in a blog post where
it might lead some future generations astray. ;-)

--
Joel Rees

Delusions of being a novelist:
http://joel-rees-economics.blogspot.com/2017/01/soc500-00-00-toc.html
David Brown
2017-04-28 08:58:34 UTC
Permalink
Post by j***@gmail.com
Post by David Brown
(Please get a real newsreader and server, rather than google
groups.)
You got a favorite?
A very popular combination is Thunderbird as the newsreader, and
news.eternal-september.org as the free newsserver. It works for Linux,
Windows and Mac. Google groups is fine for researching old posts, but a
poor interface for daily newsgroup usage. And it screws up line splits,
which is a pain for everyone else!

(I don't want to start a discussion about different newsreader clients
here - I just want to let you know that you will be happier, and we will
be happier, if you switch from google groups. Now back to C.)
Post by j***@gmail.com
I'm using googlemail because google insists on fighting with
sylpheed, but googlecode sure drags the browser through the mud and
heats the engines up mercilessly.
Post by David Brown
<snip>
Post by j***@gmail.com
You know, it's funny.
We have this coddling of brain-damaged CPUs when a left-shift on
a negative operand is specified on sign-magnitude architectures
or something, but we have this assurance that casting a -1 to an
unsigned fills the thing with ones.
Left shift of negative integers is like most signed integer
overflows - the result is undefined (but an implementation can
define it if it wants to).
When you assume that's an overflow and not a truncation, that seems
correct.
It /is/ an overflow, not a truncation. Mathematically, left shift is
multiplying by 2. Assuming 16-bit integers because the numbers are
smaller and easier, in what way could 16386 * 2 evaluating to -32768 be
described as a truncation rather than an overflow? That is what happens
when you left-shift a signed integer that is too large.

Or how about -3072 * 2 evaluating to 4096? Left shift of negative
signed integers, implemented as a simple hardware shift, gives
meaningless results.

For /unsigned/ arithmetic with modulo behaviour, you are truncating the
highest bits - that is what modulo does.
Post by j***@gmail.com
I was, for various reasons, looking for the truncation, and now I see
that the standard says I should say, what was it? (Dragging it in
from my source code instead of looking back to the previous post.)
chMask = ( chMask << 1 ) | 1;
I should say
chMask = (unsigned char) ( ( ( (unsigned long) chMask ) << 1 ) | 1 );
No, it does not say that. It is safe to write

chMask = ( chMask << 1 ) | 1

(By the way, there is only one thing I think is less useful than using
"Systems Hungarian" notation with variable names prefixed by an
abbreviation of their type - and that is to use it incorrectly and
inconsistently. But that is style, and not part of the behaviour of the
code.)
Post by j***@gmail.com
Post by David Brown
Post by j***@gmail.com
(I know calling it brain-damaged is too extreme, but ...)
And I have a hard time remembering that assurance is in the
standard. Maybe it's because I don't really use sign-magnitude
that much.
Basically, unsigned arithmetic always works as modulo arithmetic.
Signed arithmetic is not allowed to overflow. It's actually quite
simple and consistent.
If you know that unsigned char gets promoted to signed int, as Keith
Thompson pointed out to me. Somehow, I missed that all those years
ago.
I cannot remember when I learned about "integer promotions" - but they
are in §6.3.1.1 of the C11 standard. This newsgroup is a great place to
ask about that sort of thing.

And now you /do/ know about it, allowing you to simplify your code.

(And if ever Keith posts something about the standards that contradicts
what I write, believe Keith.)
Post by j***@gmail.com
Post by David Brown
Post by j***@gmail.com
The standards committee gets to tilt at windmills, and I get to chase their rainbows.
I really wish the standards committee had instead specified a
means of compile-time testing the architecture for the specific
corner- and edge-cases and providing a means to either pop up a
warning to remind the reader of compiler output that the code
wasn't intended to be compiled on that particular architecture,
or use conditional compiled sections to deal with the corner-case
on those CPUs.
There isn't really anything to test at compile time here. Don't
let your signed arithmetic overflow - it does not matter if the
underlying hardware is sign-magnitude, two's complement, or
whatever.
Well, you seem to be talking about the shift when I'm talking about
the idiom by that point.
On a sign-magnitude architecture, I think I'd prefer, instead of the
idiom, that
(unsigned char) -1
hmm. What would I prefer?
signed char thing = -1; unsigned char uthing = (unsigned long)
thing;
should give me, depending on the sign-magnitude implementation,
something like 0x81 in uthing.
Having never used a sign-magnitude architecture (and I hope I never have
to!), I can't say what I would /prefer/ here. But I know how C works,
and it says such conversions to unsigned types are done as modulo.

If you want to get the underlying bit representation of the signed
integer here, then the tool you are looking for is "union" - /not/
conversions.
Post by j***@gmail.com
Is thing promoted before the cast?
No. (But it would make no difference if it were.)
Post by j***@gmail.com
Post by David Brown
Post by j***@gmail.com
A few more hours of their work to help a thousand guys like me to
cut a few wasted days off of tilting with our own windmills every
now and again.
Post by Ben Bacarisse
byteWidth = i; i = 0; while ( ulroller != 0 ) { ulroller <<=
1; ++i; } ulBytes = i / byteWidth;
What's wrong with
ulBytes = sizeof (unsigned long);
?
Well, since you ask, you're right. Either way requires run-time behavior.
sizeof(unsigned long) is fixed at compile time, and even
half-decent compilers will optimise accordingly.
But the last time I tried to use sizeof in a context that required
the expression to evaluate to a constant at compile time, the
compiler said I was being a naughty boy and would get no pudding that
day.
Ah, there is a big difference between "something that the compiler knows
is constant at compilation time" and "something that the C language says
is a "constant expression". The former can be used for optimisation
purposes, but only the later can be used in things like "#if" directives.
Post by j***@gmail.com
I ended up using a bunch of ifdefs to pick the right constant.
Everything you need is conveniently available in <limits.h>, saving you
the effort.
Post by j***@gmail.com
The code I'm working on this time should not have that constraint,
although I'm not sure if I want to take the time to go fix it.
Calling my initialization function fixed my problem for now. It's not
exactly critical code, but it's in a blog post where it might lead
some future generations astray. ;-)
-- Joel Rees
http://joel-rees-economics.blogspot.com/2017/01/soc500-00-00-toc.html
bartc
2017-04-28 09:50:20 UTC
Permalink
Post by David Brown
It /is/ an overflow, not a truncation. Mathematically, left shift is
multiplying by 2.
Only in binary. (I found that out when working in decimal.)
--
bartc
James Kuyper
2017-04-28 14:01:34 UTC
Permalink
On 04/28/2017 12:05 AM, ***@gmail.com wrote:
...
Post by j***@gmail.com
But the last time I tried to use sizeof in a context that required
the expression to evaluate to a constant at compile time, the
compiler said I was being a naughty boy and would get no pudding that
day.
The relevant term is "Integer Constant Expression" (ICE). Unless the
argument of sizeof has a variably modified type, it always meets the
requirements to qualify as an ICE. Whatever problem it was that you ran
into, sizeof(unsigned long) wasn't the issue.
Tim Rentsch
2017-05-02 12:06:18 UTC
Permalink
...
Post by j***@gmail.com
But the last time I tried to use sizeof in a context that required
the expression to evaluate to a constant at compile time, the
compiler said I was being a naughty boy and would get no pudding that
day.
The relevant term is "Integer Constant Expression" (ICE). Unless the
argument of sizeof has a variably modified type, it always meets the
requirements to qualify as an ICE. Whatever problem it was that you ran
into, sizeof(unsigned long) wasn't the issue.
More precisely, unless the operand of sizeof is a variable length
array type (as distinguished from a variably modified type). All
variable length array types are variably modified types, but not
all variably modified types are variable length array types.
Hence the distinction is important: it is only variable length
array types that may require run-time evaluation as operands of
sizeof, and so such expressions are not compile-time constants
(ie, in the sense of "constant expressions" in C).
Ben Bacarisse
2017-04-28 22:57:28 UTC
Permalink
<snip>
Post by j***@gmail.com
Post by Ben Bacarisse
Apart from the side effect on i this could be written as
chMask = -1;
Post by j***@gmail.com
byteMask = chMask;
Or this could simply be
byteMask = (unsigned char)-1;
<snip>
Post by j***@gmail.com
We have this coddling of brain-damaged CPUs when a left-shift on a
negative operand is specified on sign-magnitude architectures or
something, but we have this assurance that casting a -1 to an unsigned
fills the thing with ones.
<snip>
Post by j***@gmail.com
And I have a hard time remembering that assurance is in the
standard. Maybe it's because I don't really use sign-magnitude that
much.
Sign+magnitude (and 1s complement) are *representations* for signed int,
but much of C is defined at a higher level. What

unsigned char c = -1;

means is defined in terms of *values*: the conversion is done by
repeatedly adding or subtracting one more than the maximum value of the
unsigned type to -1. I.e. UCHAR_MAX+1 is added -1.

The same is true of shifts. They are defined in terms of values rather
than by making reference to the representation (other than to the
width).

<snip>
--
Ben.
Tim Rentsch
2017-04-30 19:28:51 UTC
Permalink
[some parts re-ordered for continuity]
Post by Ben Bacarisse
Post by j***@gmail.com
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */
void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
}
What's wrong with
ulBytes = sizeof (unsigned long);
?
Well, since you ask, you're right. Either way requires
run-time behavior.
Do you mean the value produced cannot be used as part of a
pre-processor conditional? The 'sizeof' method does
produce a compile-time constant, albeit one that cannot
be used as part of a pre-processor conditional.
Post by Ben Bacarisse
Post by j***@gmail.com
Yes, there is a reason I am ignoring limits.h. Maybe not a
good reason, but I will debate that later.
I don't think you need limits.h for this.
I wanted to be able to use limits.h for it.
Does that mean you weren't able to? Can you explain why that
was?
I really wish the standards committee had instead specified a
means of compile-time testing the architecture for the specific
corner- and edge-cases and providing a means to either pop up a
warning to remind the reader of compiler output that the code
wasn't intended to be compiled on that particular architecture, or
use conditional compiled sections to deal with the corner-case on
those CPUs.
If what you want is a regular compile-time constant (ie, but not
usable in a pre-processor conditional), that can be done with a
'sizeof' expression:

enum { UL_SIZE = sizeof (unsigned long) };

For use in pre-processor conditionals, generally <limits.h> is
needed, because (a) sizeof doesn't work in #if etc, and (b)
even if it did, the sizes of types are potentially different in
the pre-processor than they are in the program.

If what you want is a value that can be used in a pre-processor
conditional, for the size of one of the integer types defined in
<limits.h> and/or <stdint.h>, and the headers in question are
available, these values can be calculated (making reasonable
assumptions) as illustrated in the following example:

#include <limits.h>

#define MASK_WIDTH( m ) ( 0U + (unsigned)+( \
(m) /((m)%32767 +1) /32767 %32767 *15 \
+ (m) %32767 /((m)%31 +1) /31 %31 *5 \
+ 4 - 12 / ((m)%31 + 3) \
))

#if MASK_WIDTH( ULONG_MAX ) % CHAR_BIT == 0
#define UL_SIZE_PP (MASK_WIDTH( ULONG_MAX ) / CHAR_BIT)
#else
#error Apparently unsigned long has padding bits
#endif

#if UL_SIZE_PP == 4
// unsigned long is 4 bytes
#elif UL_SIZE_PP == 8
// unsigned long is 8 bytes
#else
// huh? unsigned long is neither 4 bytes nor 8 bytes
#endif

(The assumption is that the number of padding bits is either
(a) zero, or (b) non-zero modulo the number of bits in an
unsigned char.)

If there isn't a <limits.h> available then you are pretty
much SOL.
Keith Thompson
2017-04-30 19:41:35 UTC
Permalink
Tim Rentsch <***@alumni.caltech.edu> writes:
[...]
Post by Tim Rentsch
For use in pre-processor conditionals, generally <limits.h> is
needed, because (a) sizeof doesn't work in #if etc, and (b)
even if it did, the sizes of types are potentially different in
the pre-processor than they are in the program.
I'm not sure how meaningful your point (b) is. *If* the preprocessor
were modified to permit sizeof, then it would be reasonable to
require that for example sizeof (long) must reflect the size of
type long on the target system. In fact, I'd say it would be
unreasonable not to require that. (In reality, the preprocessor
treats integer expressions as if they were of type intmax_t or
uintmax_t. N1570 6.10.1p4.)

That reminds me of a discussion on comp.std.c back in 1998. The
relevant quotation:

| > You are right. It was nice back in the days when things like
| >
| > #if (sizeof(int) == 8)
| >
| > actually worked (on some compilers).
|
| Must have been before my time.
|
| Dennis

Yes, the poster was Dennis Ritchie.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-04-30 20:46:17 UTC
Permalink
Post by Keith Thompson
[...]
I'm not sure how meaningful your point (b) is. *If* the preprocessor
were modified to permit sizeof, then it would be reasonable to
require that for example sizeof (long) must reflect the size of
type long on the target system. In fact, I'd say it would be
unreasonable not to require that. (In reality, the preprocessor
treats integer expressions as if they were of type intmax_t or
uintmax_t. N1570 6.10.1p4.)
That reminds me of a discussion on comp.std.c back in 1998. The
| > You are right. It was nice back in the days when things like
| >
| > #if (sizeof(int) == 8)
| >
| > actually worked (on some compilers).
|
I tried it on my toy compiler. Only a light treatment so the type must
be a single name like 'int' (otherwise the tokeniser needs to know and
duplicate type-decl syntax). Then this was possible:

#if sizeof(int)==8
#message "sizeof(int) == 8"
#else
#message "sizeof(int) != 8"
#endif

('#message' is a debugging feature.)

Not a proposed extension, I just wanted to know how hard it was. Not
very. (Not when the preprocessor is an integrated part of the compiler.)
Post by Keith Thompson
Post by Tim Rentsch
For use in pre-processor conditionals, generally <limits.h> is
needed, because (a) sizeof doesn't work in #if etc, and (b)
even if it did, the sizes of types are potentially different in
the pre-processor than they are in the program.
Under what circumstances? (Such a feature would mainly be used on the
primitive types.)
--
bartc
Thiago Adams
2017-05-01 12:41:56 UTC
Permalink
Post by bartc
Post by Keith Thompson
[...]
I'm not sure how meaningful your point (b) is. *If* the preprocessor
were modified to permit sizeof, then it would be reasonable to
require that for example sizeof (long) must reflect the size of
type long on the target system. In fact, I'd say it would be
unreasonable not to require that. (In reality, the preprocessor
treats integer expressions as if they were of type intmax_t or
uintmax_t. N1570 6.10.1p4.)
That reminds me of a discussion on comp.std.c back in 1998. The
| > You are right. It was nice back in the days when things like
| >
| > #if (sizeof(int) == 8)
| >
| > actually worked (on some compilers).
|
I tried it on my toy compiler. Only a light treatment so the type must
be a single name like 'int' (otherwise the tokeniser needs to know and
#if sizeof(int)==8
#message "sizeof(int) == 8"
#else
#message "sizeof(int) != 8"
#endif
If the preprocessor is integrated (have access to the symbols)
then this sample and more is possible.

The way I have the preprocessor in my C parser can access the AST (build so far).

I could do this:
#if symbol_defined MyFunc || symbol_defined MyVar
#endif

(I liked #message option)
bartc
2017-05-01 12:57:26 UTC
Permalink
Post by Thiago Adams
Post by bartc
#if sizeof(int)==8
If the preprocessor is integrated (have access to the symbols)
then this sample and more is possible.
The way I have the preprocessor in my C parser can access the AST (build so far).
#if symbol_defined MyFunc || symbol_defined MyVar
#endif
That might be harder, depending on exactly what symbol_defined does.

The problem is that MyFunc and Myvar can have multiple instances
throughout the program. They might be defined, but not visible (be in a
different scope) from the place where you do '#if symbol_defined'.

Understanding blocks and scopes and doing name-resolution is beyond what
a tokeniser can and ought to do, I think. (Maybe a job for your
parser-level conditional code.)

If the case of 'sizeof', the 'int' is treated as a reserved word
independent of scope. ('int' can be #defined, but the replacement text
is then used for sizeof.)
--
bartc
Tim Rentsch
2017-05-02 12:30:19 UTC
Permalink
[...]
Post by Tim Rentsch
For use in pre-processor conditionals, generally <limits.h> is
needed, because (a) sizeof doesn't work in #if etc, and (b)
even if it did, the sizes of types are potentially different in
the pre-processor than they are in the program.
Under what circumstances? (Such a feature would mainly be used
on the primitive types.)
Try running the following program (ie, on a 32-bit platform).

#include <stdio.h>

int
main(){
if( -1U == 0xFFFFFFFF ){
printf( " unsigned is 32 bits\n" );
} else {
printf( " unsigned is NOT 32 bits\n" );
}

#if -1U == 0xFFFFFFFF
printf( " unsigned in pre-processor is 32 bits\n" );
#else
printf( " unsigned in pre-processor is NOT 32 bits\n" );
#endif

return 0;
}
bartc
2017-05-02 15:13:00 UTC
Permalink
Post by Tim Rentsch
[...]
Post by Tim Rentsch
For use in pre-processor conditionals, generally <limits.h> is
needed, because (a) sizeof doesn't work in #if etc, and (b)
even if it did, the sizes of types are potentially different in
the pre-processor than they are in the program.
Under what circumstances? (Such a feature would mainly be used
on the primitive types.)
Try running the following program (ie, on a 32-bit platform).
#include <stdio.h>
int
main(){
if( -1U == 0xFFFFFFFF ){
printf( " unsigned is 32 bits\n" );
} else {
printf( " unsigned is NOT 32 bits\n" );
}
#if -1U == 0xFFFFFFFF
printf( " unsigned in pre-processor is 32 bits\n" );
#else
printf( " unsigned in pre-processor is NOT 32 bits\n" );
#endif
return 0;
}
What's supposed to be the correct output?

Out of 5 C compilers, two say is/is NOT, three say is/is. (And actually
the four that also work in 64 bits, give the same results in 64 bits.)

(Mine [64-bits] says is/is NOT, after it was tweaked to pay to take
notice of suffixes such as U and LL which it previously ignored.)

(BTW what is the type of 0xFFFFFFFF ? My reading of the standard
suggests it is long long int, ie. 0x00000000FFFFFFFF.

While the 1U is 32-bit 0x00000001 unsigned, negated to 0xFFFFFFFF, then
widened to 0x00000000FFFFFFFF to match the rhs. The latter is converted
to unsigned because of the mixed operation. So the comparison is done in
64 bits. But I might be wrong...)


Anyway, I don't think this is related to whether or not a specific type,
rather than an evaluation of an expression, might be different at in the
preprocessor.
--
bartc
Keith Thompson
2017-05-02 16:12:55 UTC
Permalink
[...]
Post by bartc
Post by Tim Rentsch
Try running the following program (ie, on a 32-bit platform).
#include <stdio.h>
int
main(){
if( -1U == 0xFFFFFFFF ){
printf( " unsigned is 32 bits\n" );
} else {
printf( " unsigned is NOT 32 bits\n" );
}
#if -1U == 0xFFFFFFFF
printf( " unsigned in pre-processor is 32 bits\n" );
#else
printf( " unsigned in pre-processor is NOT 32 bits\n" );
#endif
return 0;
}
What's supposed to be the correct output?
It should be

unsigned is 32 bits
unsigned in pre-processor is NOT 32 bits
Post by bartc
Out of 5 C compilers, two say is/is NOT, three say is/is. (And actually
the four that also work in 64 bits, give the same results in 64 bits.)
It seems you have three non-conforming compilers.

N1570 6.10.1p4:

For the purposes of this token conversion and evaluation, all
signed integer types and all unsigned integer types act as if
they have the same representation as, respectively, the types
intmax_t and uintmax_t defined in the header <stdint.h>.

intmax_t and uintmax_t must be at least 64 bits.
Post by bartc
(Mine [64-bits] says is/is NOT, after it was tweaked to pay to take
notice of suffixes such as U and LL which it previously ignored.)
(BTW what is the type of 0xFFFFFFFF ? My reading of the standard
suggests it is long long int, ie. 0x00000000FFFFFFFF.
The type of an unsuffixed hexadecimal integer constant is the first of
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
in which its value fits. Since unsigned long int is at least 32 bits,
it can't long long int or unsigned long long int. If int is 32 bits,
then the type of 0xFFFFFFFF is unsigned int.

In a preprocessor expression, unsigned int acts like uintmax_t (probably
64 bits).
Post by bartc
While the 1U is 32-bit 0x00000001 unsigned, negated to 0xFFFFFFFF, then
widened to 0x00000000FFFFFFFF to match the rhs. The latter is converted
to unsigned because of the mixed operation. So the comparison is done in
64 bits. But I might be wrong...)
See above. In a non-preprocessor expression, it's not widened, since
0xFFFFFFFF is (probably) of type unsigned int. In a preprocessor
expression, it's not widened because it's already 64 bits.
Post by bartc
Anyway, I don't think this is related to whether or not a specific type,
rather than an evaluation of an expression, might be different at in the
preprocessor.
It is.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
bartc
2017-05-02 17:15:29 UTC
Permalink
Post by Keith Thompson
Post by bartc
Anyway, I don't think this is related to whether or not a specific type,
rather than an evaluation of an expression, might be different in the
preprocessor.
It is.
How?

For preprocessor expressions used in #if, the PP just uses the maximum
int width available, which appears to be long long or 64 bits even when
the compiler has a 32-bit target and 32-bit ints.

But I don't think the type of expressions used in #if is anything to do
with the types normally used in the code. Unless someone specifically
wants to make use of intmax_t.

Even then, I can't see how that can be different between preprocessor
and the rest of the compiler, unless someone deliberately redefines
intmax_t. Which they can do for any type, whether the preprocessor is
involved or not.
--
bartc
Tim Rentsch
2017-04-30 21:42:51 UTC
Permalink
[...]
Post by Tim Rentsch
For use in pre-processor conditionals, generally <limits.h> is
needed, because (a) sizeof doesn't work in #if etc, and (b)
even if it did, the sizes of types are potentially different in
the pre-processor than they are in the program.
I'm not sure how meaningful your point (b) is. [.. sizeof
in the pre-processor could use execution types rather than
pre-processor types ..]
I mentioned point (b) as a reminder to the unwary, of which I
have to count myself as one. When I was first writing a reply to
the previous message, I thought it would be possible to get by
without using <limits.h>. To do that I used the MASK_WIDTH()
macro on (-1UL), and was baffled for longer than I care to admit
when I got 8 bytes as the size of unsigned long even though
sizeof (unsigned long) is 4 on that platform. Of course once I
realized what had happened I just slapped my forehead, but I
thought I might save other people some trouble...
j***@gmail.com
2017-05-01 06:57:38 UTC
Permalink
Post by Tim Rentsch
[some parts re-ordered for continuity]
Thank you for doing that.
Post by Tim Rentsch
Post by Ben Bacarisse
Post by j***@gmail.com
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */
void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
}
What's wrong with
ulBytes = sizeof (unsigned long);
?
Well, since you ask, you're right. Either way requires
run-time behavior.
Do you mean the value produced cannot be used as part of a
pre-processor conditional?
Precisely. (I think.)
Post by Tim Rentsch
The 'sizeof' method does
produce a compile-time constant, albeit one that cannot
be used as part of a pre-processor conditional.
There is a difference?

Oh. Yeah. I guess I want to say I needed a "pre-processor-time
constant."
Post by Tim Rentsch
Post by Ben Bacarisse
Post by j***@gmail.com
Yes, there is a reason I am ignoring limits.h. Maybe not a
good reason, but I will debate that later.
I don't think you need limits.h for this.
I wanted to be able to use limits.h for it.
Does that mean you weren't able to? Can you explain why that
was?
I was trying to define a union that would describe all the things
that a C program would need to know could be pushed on a Forth
interpreter stack.

Or something along those lines.

I think I needed to know how many bits wide the resulting union
was. Or something like that. Not so much needing to know, but
needing to be sure the program would compile with a sufficiently
wide stack element size. And be sure that I could get the unions
on and off the stack reliably.

Could not work out a valid pre-processor expression from what was
available at the time in a standard limits.h.

Byte order issues were mixed in there, but I ultimately ignored
the byte order within the stack elements.
Post by Tim Rentsch
I really wish the standards committee had instead specified a
means of compile-time testing the architecture for the specific
corner- and edge-cases and providing a means to either pop up a
warning to remind the reader of compiler output that the code
wasn't intended to be compiled on that particular architecture, or
use conditional compiled sections to deal with the corner-case on
those CPUs.
If what you want is a regular compile-time constant (ie, but not
usable in a pre-processor conditional), that can be done with a
enum { UL_SIZE = sizeof (unsigned long) };
And that would be available way too late.
Post by Tim Rentsch
For use in pre-processor conditionals, generally <limits.h> is
needed, because (a) sizeof doesn't work in #if etc, and (b)
even if it did, the sizes of types are potentially different in
the pre-processor than they are in the program.
And, since you mention it, I had to work through that question,
too. I'm not sure I solved it.

Ended up using a program to test the architecture and generate a
header file that the header file would include.

And now, since the waste product has hit the fan about the gcc
compiler engineers deciding optimization was such a high priority
they would just start silently disposing of undefined behaviors
and implement implementation defined behaviors with disappearing
code, I'm wondering how long it will be until the program I wrote
to test the architecture will spit out a header where everything
gets defined to empty expressions.
Post by Tim Rentsch
If what you want is a value that can be used in a pre-processor
conditional, for the size of one of the integer types defined in
<limits.h> and/or <stdint.h>, and the headers in question are
available, these values can be calculated (making reasonable
#include <limits.h>
#define MASK_WIDTH( m ) ( 0U + (unsigned)+( \
(m) /((m)%32767 +1) /32767 %32767 *15 \
+ (m) %32767 /((m)%31 +1) /31 %31 *5 \
+ 4 - 12 / ((m)%31 + 3) \
))
#if MASK_WIDTH( ULONG_MAX ) % CHAR_BIT == 0
#define UL_SIZE_PP (MASK_WIDTH( ULONG_MAX ) / CHAR_BIT)
#else
#error Apparently unsigned long has padding bits
#endif
#if UL_SIZE_PP == 4
// unsigned long is 4 bytes
#elif UL_SIZE_PP == 8
// unsigned long is 8 bytes
#else
// huh? unsigned long is neither 4 bytes nor 8 bytes
#endif
(The assumption is that the number of padding bits is either
(a) zero, or (b) non-zero modulo the number of bits in an
unsigned char.)
I didn't do nearly as much math, but that's what I did until I
hit the problem I'm referring to.
Post by Tim Rentsch
If there isn't a <limits.h> available then you are pretty
much SOL.
Oh, I used it, just wasn't able to produce the constant I needed
with limits.h by itself.

I don't know whether you want to see the contorted mess, but
the project is here

https://sourceforge.net/projects/bif-c/

the header where I manufactured the union is here

https://sourceforge.net/p/bif-c/code/HEAD/tree/trunk/bifu_i.h

the program for generating the header is in this subdirectory

https://sourceforge.net/p/bif-c/code/HEAD/tree/trunk/configs/

The preprocessor computed version, similar to what you suggest,
is in r15, here

https://sourceforge.net/p/bif-c/code/15/tree/trunk/bifu_i.h

starting with the comments on line 56.

FWIW.

--
Joel Rees
http://reiisi.blogspot.com
Tim Rentsch
2017-05-02 11:32:18 UTC
Permalink
Post by j***@gmail.com
Post by Tim Rentsch
[some parts re-ordered for continuity]
Thank you for doing that.
Post by Tim Rentsch
Post by Ben Bacarisse
Post by j***@gmail.com
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */
void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
}
What's wrong with
ulBytes = sizeof (unsigned long);
?
Well, since you ask, you're right. Either way requires
run-time behavior.
Do you mean the value produced cannot be used as part of a
pre-processor conditional?
Precisely. (I think.)
Post by Tim Rentsch
The 'sizeof' method does
produce a compile-time constant, albeit one that cannot
be used as part of a pre-processor conditional.
There is a difference?
Oh. Yeah. I guess I want to say I needed a "pre-processor-time
constant."
Post by Tim Rentsch
Post by Ben Bacarisse
Post by j***@gmail.com
Yes, there is a reason I am ignoring limits.h. Maybe not a
good reason, but I will debate that later.
I don't think you need limits.h for this.
I wanted to be able to use limits.h for it.
Does that mean you weren't able to? Can you explain why that
was?
I was trying to define a union that would describe all the things
that a C program would need to know could be pushed on a Forth
interpreter stack.
Or something along those lines.
I think I needed to know how many bits wide the resulting union
was. Or something like that. Not so much needing to know, but
needing to be sure the program would compile with a sufficiently
wide stack element size. And be sure that I could get the unions
on and off the stack reliably.
Could not work out a valid pre-processor expression from what was
available at the time in a standard limits.h.
[...]
Ended up using a program to test the architecture and generate a
header file that the header file would include.
Okay...
Post by j***@gmail.com
Post by Tim Rentsch
[...how to calculate bit widths in the pre-processor]
I didn't do nearly as much math, but that's what I did until I
hit the problem I'm referring to.
Post by Tim Rentsch
If there isn't a <limits.h> available then you are pretty
much SOL.
Oh, I used it, just wasn't able to produce the constant I needed
with limits.h by itself.
I don't know whether you want to see the contorted mess, but
the project is here
https://sourceforge.net/projects/bif-c/
the header where I manufactured the union is here
https://sourceforge.net/p/bif-c/code/HEAD/tree/trunk/bifu_i.h
the program for generating the header is in this subdirectory
https://sourceforge.net/p/bif-c/code/HEAD/tree/trunk/configs/
The preprocessor computed version, similar to what you suggest,
is in r15, here
https://sourceforge.net/p/bif-c/code/15/tree/trunk/bifu_i.h
starting with the comments on line 56.
I downloaded the specific files you mentioned, and then the
entire project. Looking through the code, AFAICT it should
be fairly straightforward to do what you want to do using
just <limits.h> and pre-processor tests (and #defines, etc).
Of course I may have missed or misunderstood something.
Can you explain more specifically what value you need to
produce? It would help if you could give both a description
and an example case (or cases plural) along with the value(s)
for the example case(s).
James Kuyper
2017-04-27 14:46:40 UTC
Permalink
Post by j***@gmail.com
----------------
unsigned byteWidth = 8; /* Not depending on limits.h . */
unsigned byteMask = 0xFF;
unsigned ulBytes = 4; /* Sane throw-away initial values. */
void setULbytes( void )
{ int i = 0;
unsigned char chroller = 1;
unsigned char chMask = 1;
unsigned long ulroller = 1;
while ( chroller != 0 )
{ chroller <<= 1;
chMask = ( chMask << 1 ) | 1;
++i;
}
byteMask = chMask;
byteWidth = i;
i = 0;
while ( ulroller != 0 )
{ ulroller <<= 1;
++i;
}
ulBytes = i / byteWidth;
}
----------------
Yes, there is a reason I am ignoring limits.h. Maybe not a good
reason, but I will debate that later.
Right now I just would like to hear whether shifting unsigned values
until the bits are supposed to disappear, and testing them for zero,
For unsigned types, the behavior of such shifts are well defined. The
disappearance of bits that you refer to is described in the standard
using the phrase "... reduced modulo one more than the maximum value
representable in the result type." (6.7.4p4)
So long as UCHAR_MAX < INT_MAX, your unsigned char value will get
promoted to "int", which is a signed type. However, unless 2U*UCHAR_MAX
Post by j***@gmail.com
INT_MAX, you won't run into trouble with this code, since you never
shift by more than 1 bit. If you want to avoid worrying about that
remote possibility, use unsigned int for all of the intermediate steps
in this process.
Post by j***@gmail.com
is considered in the range of machine-defined stuff that the current
crop of optimization sophomores (pardon my Latin) conflates with
undefined.
Well, you don't have to be an optimization sophomore to understand
correctly that the behavior is undefined for signed types. All you have
to do is be able to read English. Later in the same paragraph it says
(about E1 << E2) that "If E1 has a signed type and nonnegative value,
and E1 × 2^E2 is representable in the result type, then that is the
resulting value; otherwise, the behavior is undefined."

Note that "machine-defined" is a meaningless concept in this context. A
machine might have an instruction that could be used to implement a
bit-wise shift, but an implementation of C targeting that machine is
under no obligation to translate a bit-wise shift instruction into that
instruction. When the behavior is well-defined, the implementation is
required to produce the specified behavior, but using that particular
machine instruction is not the only way that could be achieved. Sane
implementors will use some other method only if they think they have a
good reason for doing so. However, if they do think that, the standard
says nothing to prohibit them from doing so.

When the behavior is undefined, the standard, by definition, imposes no
obligations whatsoever on the implementation. In particular, the
implementation is not obligated to produce the same behavior that would
have occurred if it had used that particular machine instruction to
implement the shift. If there's any good reason for using any other way
of achieving the same result when the behavior is defined, that
alternative way has a good chance of NOT producing the same behavior
when the behavior is undefined.
s***@casperkitty.com
2017-04-27 15:37:39 UTC
Permalink
Post by James Kuyper
Well, you don't have to be an optimization sophomore to understand
correctly that the behavior is undefined for signed types. All you have
to do is be able to read English. Later in the same paragraph it says
(about E1 << E2) that "If E1 has a signed type and nonnegative value,
and E1 × 2^E2 is representable in the result type, then that is the
resulting value; otherwise, the behavior is undefined."
The clear intention of the Standard is that on a sign-magnitude platform
whose hardware may randomly catch fire if the left-shift instruction is
fed a negative number, a compiler for such a platform would have no
obligation to prevent that from happening. I see no evidence that the
authors of the Standard expected or intended that implementations
targeting two's-complement hardware would should not opt to behave "in a
documented manner characteristic of the environment". The authors of C89
deliberately avoided defining behaviors that would be applicable to some
machines but not others, figuring that compiler writers and programmers
alike should be able to recognize the obvious useful cases.

Incidentally, I'm curious about which standard-conforming C compiler for
a two's-complement machine was the first one to perform an optimization
by treating a left shift of a negative number as something other than
repeated multiplication by two in cases where the latter would not
overflow. I would guess it was this century, but it would be interesting
to know precisely.
Keith Thompson
2017-04-27 16:38:51 UTC
Permalink
***@casperkitty.com writes:
[...]
Post by s***@casperkitty.com
The clear intention of the Standard is that on a sign-magnitude platform
whose hardware may randomly catch fire if the left-shift instruction is
fed a negative number, a compiler for such a platform would have no
obligation to prevent that from happening. I see no evidence that the
authors of the Standard expected or intended that implementations
targeting two's-complement hardware would should not opt to behave "in a
documented manner characteristic of the environment". The authors of C89
deliberately avoided defining behaviors that would be applicable to some
machines but not others, figuring that compiler writers and programmers
alike should be able to recognize the obvious useful cases.
The authors of the C89 standard deliberately stated that a left
shift on a negative value has undefined behavior. "behaving
during translation or program execution in a documented manner
characteristic of the environment" (that's quoted from a note on
the definition of "undefined behavior") is just one of the possible
behaviors, and it is not emphasized over any of the others.

An optimizing compiler may assume that the left operand of a left
shift operator is non-negative. If that assumption is violated,
the resulting code may behave differently than if the compiler
generated a shift instruction. A compiler might also generate code
to check that the left operand is non-negative and trap if it isn't.

Don't you get tired of making the same point again and again
and again?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-04-27 17:17:51 UTC
Permalink
Post by Keith Thompson
The authors of the C89 standard deliberately stated that a left
shift on a negative value has undefined behavior. "behaving
during translation or program execution in a documented manner
characteristic of the environment" (that's quoted from a note on
the definition of "undefined behavior") is just one of the possible
behaviors, and it is not emphasized over any of the others.
Such behavior wouldn't be applicable on all machines, and the Standard
seeks to avoid showing favoritism to particular architectures.

On the other hand, I'm curious what evidence you could find from before
2000 that would suggest that left-shifting of negative numbers would have
been expected to have any adverse implications *other than incompatibility
with non-two's-complement hardware, or with configuration options that
would explicitly trap it*.
Post by Keith Thompson
An optimizing compiler may assume that the left operand of a left
shift operator is non-negative. If that assumption is violated,
the resulting code may behave differently than if the compiler
generated a shift instruction. A compiler might also generate code
to check that the left operand is non-negative and trap if it isn't.
Doing so would not make a compiler non-conforming. It would merely limit
the range of purposes for which the compiler was suitable.
Post by Keith Thompson
Don't you get tired of making the same point again and again
and again?
Meh--I'm a glutton for punishment.
Keith Thompson
2017-04-27 19:10:00 UTC
Permalink
Post by s***@casperkitty.com
Post by Keith Thompson
The authors of the C89 standard deliberately stated that a left
shift on a negative value has undefined behavior. "behaving
during translation or program execution in a documented manner
characteristic of the environment" (that's quoted from a note on
the definition of "undefined behavior") is just one of the possible
behaviors, and it is not emphasized over any of the others.
Such behavior wouldn't be applicable on all machines, and the Standard
seeks to avoid showing favoritism to particular architectures.
On the other hand, I'm curious what evidence you could find from before
2000 that would suggest that left-shifting of negative numbers would have
been expected to have any adverse implications *other than incompatibility
with non-two's-complement hardware, or with configuration options that
would explicitly trap it*.
I haven't said or implied that there is any such evidence, and I'm not
sufficiently interested in the question to go looking. I find the
wording of the standard clear enough.

Left-shifting of negative numbers is not an operation that's
supported by C, so I'm not going to try to use it.
Post by s***@casperkitty.com
Post by Keith Thompson
An optimizing compiler may assume that the left operand of a left
shift operator is non-negative. If that assumption is violated,
the resulting code may behave differently than if the compiler
generated a shift instruction. A compiler might also generate code
to check that the left operand is non-negative and trap if it isn't.
Doing so would not make a compiler non-conforming. It would merely limit
the range of purposes for which the compiler was suitable.
It would be more useful for the purpose of detecting code whose behavior
is undefined.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Tim Rentsch
2017-04-30 20:50:24 UTC
Permalink
Post by Keith Thompson
The authors of the C89 standard deliberately stated that a left
shift on a negative value has undefined behavior. [...]
Did you mean the C99 standard? In C89/C90 left shifting
a negative value is implementation-defined behavior, not
undefined behavior.
Keith Thompson
2017-04-30 21:11:53 UTC
Permalink
Post by Tim Rentsch
Post by Keith Thompson
The authors of the C89 standard deliberately stated that a left
shift on a negative value has undefined behavior. [...]
Did you mean the C99 standard? In C89/C90 left shifting
a negative value is implementation-defined behavior, not
undefined behavior.
No, I *meant* C89, but I was mistaken.

However, C89/C90 doesn't say that a left shift of a negative value has
implementation-defined behavior, at least not explicitly. (It does say
so for right shift.) It defines the behavior in terms of bit positions,
not values. It then clarifies the semantics in terms of values when E1
has an unsigned type, but not when E1 has a signed type. I suppose you
could say that the result depends on the representation, which is
implementation-defined.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Tim Rentsch
2017-04-30 21:44:11 UTC
Permalink
Post by Keith Thompson
Post by Tim Rentsch
Post by Keith Thompson
The authors of the C89 standard deliberately stated that a left
shift on a negative value has undefined behavior. [...]
Did you mean the C99 standard? In C89/C90 left shifting
a negative value is implementation-defined behavior, not
undefined behavior.
No, I *meant* C89, but I was mistaken.
However, C89/C90 doesn't say that a left shift of a negative value has
implementation-defined behavior, at least not explicitly. (It does say
so for right shift.) It defines the behavior in terms of bit positions,
not values. It then clarifies the semantics in terms of values when E1
has an unsigned type, but not when E1 has a signed type. I suppose you
could say that the result depends on the representation, which is
implementation-defined.
I see you have responded to my other posting so I will
follow up there.
j***@gmail.com
2017-05-01 05:05:48 UTC
Permalink
Post by Keith Thompson
Post by Tim Rentsch
Post by Keith Thompson
The authors of the C89 standard deliberately stated that a left
shift on a negative value has undefined behavior. [...]
Did you mean the C99 standard? In C89/C90 left shifting
a negative value is implementation-defined behavior, not
undefined behavior.
No, I *meant* C89, but I was mistaken.
However, C89/C90 doesn't say that a left shift of a negative value has
implementation-defined behavior, at least not explicitly. (It does say
so for right shift.) It defines the behavior in terms of bit positions,
not values. It then clarifies the semantics in terms of values when E1
has an unsigned type, but not when E1 has a signed type. I suppose you
could say that the result depends on the representation, which is
implementation-defined.
Thanks, Keith.

That clears up a lot for me.

--
Joel Rees

http://reiisi.blogspot.com
Loading...