Is this result against the standard?

Post by i***@gmail.com

PLEASE ASSUME THAT INT OCCUPIES 4 BYTES. <<

"void main(void)" should be "int main(void)

Post by i***@gmail.com
printf("%d\n", sizeof(struct s));

"%d" should be "%zu".

Post by i***@gmail.com
}
---------------------------------------------
and the output was a bit unexpected.
---------------------------------------------
12
---------------------------------------------
As stated by the C11 draft,
... If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent bits of
the same unit... I expected it to fit into 4 bytes since I used a
32-bit compiler. Specifically, I used gcc (tdm-1) 5.1.0. Is it against
the standard?

This *might* be non-conforming. We can't be sure without seeing how the
bit fields are actually allocated. A compiler may legally add arbitrary
padding at the end of a structure, so the output you're seeing is
consistent with all the bit fields being allocated correctly. But it's
unlikely that a compiler would allocate more padding at the end of a
structure than is actually needed for correct alignment.

The output I get on several versions of gcc is 4, but I do get 12 using
tcc (this is on Linux x86_64). A version of your program that shows
where the bit fields are actually allocated indicate that tcc allocates
the 6-bit int bit field in one 32-bit word, the 6 _Bool bit fields in
another 32-bit word, and the final 12-bit int bit field in another
32-bit word.

Here's the program (it's a bit long):

#include <stdio.h>
#include <string.h>

static void dump(void *base, size_t size) {
for (unsigned char *p = base; p < (unsigned char*)base + size; p ++) {
printf("%02x", *p);
}
putchar('\n');
}

struct s {
int a : 6;
_Bool b : 1;
_Bool c : 1;
_Bool d : 1;
_Bool e : 1;
_Bool f : 1;
_Bool g : 1;
int h : 12;
};

int main(void) {
printf("sizeof (struct s) = %zu\n", sizeof(struct s));
struct s obj;

fputs("int a:6 ", stdout);
memset(&obj, 0, sizeof obj);
obj.a = -1;
dump(&obj, sizeof obj);

fputs("_Bool b:1 ", stdout);
memset(&obj, 0, sizeof obj);
obj.b = 1;
dump(&obj, sizeof obj);

fputs("_Bool c:1 ", stdout);
memset(&obj, 0, sizeof obj);
obj.c = 1;
dump(&obj, sizeof obj);

fputs("_Bool d:1 ", stdout);
memset(&obj, 0, sizeof obj);
obj.d = 1;
dump(&obj, sizeof obj);

fputs("_Bool e:1 ", stdout);
memset(&obj, 0, sizeof obj);
obj.e = 1;
dump(&obj, sizeof obj);

fputs("_Bool f:1 ", stdout);
memset(&obj, 0, sizeof obj);
obj.f = 1;
dump(&obj, sizeof obj);

fputs("_Bool g:1 ", stdout);
memset(&obj, 0, sizeof obj);
obj.g = 1;
dump(&obj, sizeof obj);

fputs("int h:12 ", stdout);
memset(&obj, 0, sizeof obj);
obj.h = -1;
dump(&obj, sizeof obj);

return 0;
}

and here's the output I get using gcc:

sizeof (struct s) = 4
int a:6 3f000000
_Bool b:1 40000000
_Bool c:1 80000000
_Bool d:1 00010000
_Bool e:1 00020000
_Bool f:1 00040000
_Bool g:1 00080000
int h:12 00f0ff00

Using tcc, I get:

sizeof (struct s) = 12
int a:6 3f0000000000000000000000
_Bool b:1 000000000100000000000000
_Bool c:1 000000000200000000000000
_Bool d:1 000000000400000000000000
_Bool e:1 000000000800000000000000
_Bool f:1 000000001000000000000000
_Bool g:1 000000002000000000000000
int h:12 0000000000000000ff0f0000

which I believe is non-conforming.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

s***@casperkitty.com

2017-05-02 21:35:04 UTC

Post by i***@gmail.com
As stated by the C11 draft,
... If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent bits of
the same unit... I expected it to fit into 4 bytes since I used a
32-bit compiler. Specifically, I used gcc (tdm-1) 5.1.0. Is it against
the standard?

In the earlier K&R days, the "unit" in the phrase "the same unit" referred
to an object of the specified member type. Thus, on a system with 8/16/32
char/short/long, the structure:

struct foo {
char a : 6;
char b : 6;
char c : 6;
char d : 6;
}

the objects would be placed into four char-sized units; if the structure
had been written as

struct foo {
short a : 6;
short b : 6;
short c : 6;
short d : 6;
}

they would have been packed into two short-sized units. If written as

struct foo {
long a : 6;
long b : 6;
long c : 6;
long d : 6;
}

they would have all been packed into one 32-bit unit. If types were mixed
and matched, each type that was something other than a signed or unsigned
version of the previous type would start a new unit.

When bitfields were added to the Standard, it was changed to allow a bit
more flexibility, thus allowing an implementation given one of the above
declarations to, at its option, store the four fields among three bytes
without wasted space. The Standard does not appear to say how an
implementation should decide upon what kind of unit to use for storing bit
fields, and nothing in the rationale suggests an intention to forbid
implementations from using the K&R style layout (since code that cares
about bit-field layout is inherently going to be non-portable, forbidding
such layouts would break existing code for essentially zero benefit). While
the tcc behavior is no longer fashionable, that does not imply it is "broken".

Keith Thompson

2017-05-02 22:23:09 UTC

[...]

Post by s***@casperkitty.com
When bitfields were added to the Standard, it was changed to allow a
bit more flexibility, thus allowing an implementation given one of the
above declarations to, at its option, store the four fields among
three bytes without wasted space. The Standard does not appear to say
how an implementation should decide upon what kind of unit to use for
storing bit fields, and nothing in the rationale suggests an intention
to forbid implementations from using the K&R style layout (since code
that cares about bit-field layout is inherently going to be
non-portable, forbidding such layouts would break existing code for
essentially zero benefit). While the tcc behavior is no longer
fashionable, that does not imply it is "broken".

What the standard says is:

An implementation may allocate any addressable storage unit large
enough to hold a bitfield. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit.

The structure in the original post was:

struct s {
int a : 6;
_Bool b : 1;
/* ... */
};

Whatever kind of "unit" the implementation chooses to use, after 6 bits
have been allocated for `a`, there is clearly enough remaining space for
`b`. tcc allocates `b` in a new 32-bit word. It refers to "a bit-field
that immediately follows another bit-field"; it makes no exception for
bit-fields of different types. I cannot think of any reasonable
interpretation by which that is conforming.

Nick Bowler

2017-05-03 14:31:01 UTC

This *might* be non-conforming. We can't be sure without seeing how
the bit fields are actually allocated.

Even if we know that the bit-fields are actually allocated contrary
to the rules, it's only a conformance issue if a strictly conforming
program can tell the difference between the way they are allocated
and every possible valid interpretation of the rules.

Post by Keith Thompson
An implementation may allocate any addressable storage unit large
enough to hold a bitfield. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit.
struct s {
int a : 6;
_Bool b : 1;
/* ... */
};
Whatever kind of "unit" the implementation chooses to use, after 6 bits
have been allocated for `a`, there is clearly enough remaining space for
`b`.

I think there's some shaky foundation here because the standard does
not formally define "addressable" nor does it formally define "storage
unit", though there may be definitions of these terms in one of the
standard's normative references.

We can infer some meaning from other definitions... from the definition
of "byte" we see that bytes definitely count as an "addressable storage
unit". The definition of "bit" shows that bits are "storage units" and
implies that they may or may not be addressable.

So selecting "addressable storage units" with strange sizes (e.g.,
minimally sized to contain single bit-field members) certainly seems
possible on a conforming implementation.

Keith Thompson

2017-05-03 15:34:30 UTC

Nick Bowler <***@draconx.ca> writes:
[...]

Post by Nick Bowler
We can infer some meaning from other definitions... from the definition
of "byte" we see that bytes definitely count as an "addressable storage
unit". The definition of "bit" shows that bits are "storage units" and
implies that they may or may not be addressable.

[...]

It seems clear to me that bits are not addressable.

Tim Rentsch

2017-05-05 07:10:39 UTC

Post by Nick Bowler

This *might* be non-conforming. We can't be sure without seeing how
the bit fields are actually allocated.

I think your reasoning is wrong here, for two different reasons.

The first is that there is a conformance issue if any program
behaves in a way that is not consistent with the Standard's
requirements, not just strictly conforming programs.

The second is that there is a conformance issue if (and only if,
IIANM) a program behaves in a way that is not consistent with the
Standard's requirements /as those requirements are understood by
WG14/ (and possibly other ISO bodies, I'm not completely sure
which group has definitive authority). An argument might be made
that a particular interpretation is what WG14 intended, and go on
to conclude that an observed behavior could be non-conforming.
But the behavior only /is/ non-conforming if it is not consistent
with WG14's interpretation of the Standard's requirements.

Post by Nick Bowler

I think there's some shaky foundation here because the standard does
not formally define "addressable" nor does it formally define "storage
unit", though there may be definitions of these terms in one of the
standard's normative references.
We can infer some meaning from other definitions... from the definition
of "byte" we see that bytes definitely count as an "addressable storage
unit". The definition of "bit" shows that bits are "storage units" and
implies that they may or may not be addressable.
So selecting "addressable storage units" with strange sizes (e.g.,
minimally sized to contain single bit-field members) certainly seems
possible on a conforming implementation.

In N1570, section 6.7.2.1 p15 says this:

Within a structure object, the non-bit-field members and the
units in which bit-fields reside have addresses that
increase in the order in which they are declared. [...]

This sentence makes it clear that "addressable units" is meant in
the usual sense of C addressability, ie, one or more char-sized
chunks, whose addresses are suitable for use with functions
such as memcpy(), etc.

s***@casperkitty.com

2017-05-03 17:01:20 UTC

On Tuesday, May 2, 2017 at 5:23:11 PM UTC-5, Keith Thompson wrote:> Whatever kind of "unit" the implementation chooses to use, after 6 bits

Post by Keith Thompson
have been allocated for `a`, there is clearly enough remaining space for
`b`. tcc allocates `b` in a new 32-bit word. It refers to "a bit-field
that immediately follows another bit-field"; it makes no exception for
bit-fields of different types. I cannot think of any reasonable
interpretation by which that is conforming.

If the type of storage unit employed for a bitfield has both data bits and
padding bits, it would seem reasonable to allow an implementation to employ
those padding bits in a bitfield's representation if it happens to be
practical, but not to require such usage in cases where the implementer would
deem it impractical.

I would suggest that the behavior thus described would be allowed if an
implementation were to document that the first bitfield using a particular
specified type will be stored in an object whose total allocation is the
same as that type, but which only has enough data bits to hold the bitfield
(all other bits would be padding bits); the implementation could then go on
to specify cases where it would be deemed "practical" to use the padding
bits for the storage of additional bitfields.

I personally think the way bitfields are presently specified is totally
silly, since I can't think of anything portable code could do with bitfields
that it wouldn't be able to do to if the Standard simply said that a
sequence of bitfields may be stored as any number of storage units of any
convenient alignment, provided that the storage layout of any initial
sequence is not affected by the presence of later bitfields or objects in
the structure. On systems which have quick "test sign" instructions but
slow shifts, the optimal layout for all of:

struct s1 { unsigned char x : 1; }
struct s2 { unsigned char y : 4; }
struct s3 { unsigned char x : 1; unsigned char y:4; }
struct s4 { unsigned char y : 4; unsigned char x:1; }

might be to have x (if present) in bit 7 and y (if present) in bits 0-3.
Is there anything useful portable code could do given the layout rules in
the Standard that it wouldn't be able to do if the structures were laid
out as I described?

Keith Thompson

2017-05-03 19:02:36 UTC

Post by Keith Thompson
Whatever kind of "unit" the implementation chooses to use, after 6 bits
have been allocated for `a`, there is clearly enough remaining space for
`b`. tcc allocates `b` in a new 32-bit word. It refers to "a bit-field
that immediately follows another bit-field"; it makes no exception for
bit-fields of different types. I cannot think of any reasonable
interpretation by which that is conforming.

How can a storage unit have "padding bits"? That term applies only
to integer types. I see no suggestion that the "storage units" in
which bit-fields are allocated are integers. (Bit-fields themselves
are of integer type; storage units are not.)

There is enough space in the storage unit for a 1-bit _Bool
bit-field. The compiler does not allocate that bit-field in that
storage unit. I'd say that's a clear violation of the requirements
in the standard.

If the authors of the compiler (tcc) used "padding bits" as an excuse,
I'd say they're playing with words so they can ignore what the standard
clearly requires. (As far as I know they don't; I haven't looked at any
tcc documentation.)

[...]

s***@casperkitty.com

2017-05-03 21:23:16 UTC

Post by Keith Thompson
How can a storage unit have "padding bits"? That term applies only
to integer types. I see no suggestion that the "storage units" in
which bit-fields are allocated are integers. (Bit-fields themselves
are of integer type; storage units are not.)

I see nothing to suggest that storage units couldn't be allocated as
integers; indeed, that would be the most logical way to handle them.
Given something like:

struct { unsigned long x:2; unsigned long y:29; unsigned long z:2;} s1;

a typical implementation of s1.y++ would be something like:

unsigned long temp = (unsigned long*)&s1;
unsigned long temp2 = temp1 + 4; // Add 1, but shifted since y starts at
// bit 4
temp2 ^= temp;
temp2 &= 0x3FFFFFFC;
(unsigned long*)&s1 = temp ^ temp2;

with the proviso that the compiler-generated wouldn't need to worry about
aliasing issues.

If loads and stores of the "unsigned long" type ignore some bits on read and
trash some bits on write, requiring that bitfields be allowed to use even
those padding bits would complicate the update process as much handling the
general case where bit fields span storage units--a case the Standard says
implementations don't have to handle.

Post by Keith Thompson
If the authors of the compiler (tcc) used "padding bits" as an excuse,
I'd say they're playing with words so they can ignore what the standard
clearly requires. (As far as I know they don't; I haven't looked at any
tcc documentation.)

Nothing in the rationale suggests that the authors of the Standard intended
to forbid compilers from behaving in the way that they were previously
required to behave, nor that it would require them to document such behaviors
in a fashion that was less clear than the earlier Standard. If the old
behavior would be permissible if documented in opaque fashion, the fact that
an implementation documents it in clearer fashion might technically make the
implementation "non-conforming", but that would be a pretty minor "defect"
as such things go, and could perhaps best be remedied by adding a "pedants
only" appendix of the documentation and adding to the simpler description a
note that pedants should look in the appendix for the "official" definition
of the behavior, but everyone else should get by just fine with a simpler
description that would be equivalent in all ways except terminology.

Keith Thompson

2017-05-03 22:15:41 UTC

I see nothing to suggest that storage units couldn't be allocated as
integers; indeed, that would be the most logical way to handle them.
struct { unsigned long x:2; unsigned long y:29; unsigned long z:2;} s1;

Not the best example; implementations are not required to support
unsigned long bit-fields.

[...]

Post by s***@casperkitty.com
Nothing in the rationale suggests that the authors of the Standard intended
to forbid compilers from behaving in the way that they were previously
required to behave, nor that it would require them to document such behaviors
in a fashion that was less clear than the earlier Standard.

Nothing in the rationale or in the standard suggests that this
requirement:

If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent bits
of the same unit.

means anything other than what it actually says.

[...]

Francis Glassborow

2017-05-04 13:45:52 UTC

Not the best example; implementations are not required to support
unsigned long bit-fields.
[...]

Nothing in the rationale or in the standard suggests that this
If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent bits
of the same unit.
means anything other than what it actually says.
[...]

I am clear (I think) as to what the Standard currently requires and I
was not at meetings where it was discussed. However I have a strong
sense that we have now over-specified bit fields.

Given that the original intent (AIUI) of bit fields was to map bits in
(flag) registers/words to their usage it would have been pretty stupid
for an implementer not to pack them into a single storage unit but that
I think should be a QoI issue. The classical usage of bit-fields was
inherently non-portable.
However a programmer might pack user defined flags into bit-fields and
doing so is a standard way to optimise space over speed. So again in a
restricted memory environment packing user defined flags into a single
storage unit makes sense but I do not see the sense of requiring it for
platforms where the programmer might prefer speed over space. I would, I
think, prefer packed bit fields where space is at a premium and the
freedom to unpack them in other contexts.
I note that packed bit-fields can be faster as well as smaller in
contexts where the value of the storage unit can be kept in a register
or cache.
However, despite my feelings on the subject the Standard mandates that
they be packed. Fortunately it has not yet required that the
implementation select a sufficiently large unit of storage to
accommodate all the bit-fields in a struct in a single unit if such a
unit exists. Clearly there are times when allocating space in a 64-bit
type will be space efficient over storing them in two 32-bit units but
as far as I know, implementations are still free to choose.
Enough. Sometimes we need to change the things we can and not waste time
discussing changing things we cannot. If a programmer does not like C
rules they are free to change jobs to a place where C is not used. :)

Francis

Tim Rentsch

2017-05-05 06:44:28 UTC

Post by Keith Thompson
[...]
Nothing in the rationale or in the standard suggests that this
If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent bits
of the same unit.
means anything other than what it actually says.
[...]

You don't mention bit-fields of length zero, which prevent any
further packing in the current addressable unit (assuming it
is partially filled). How does this capability affect your
assessment?

Francis Glassborow

2017-05-05 14:34:01 UTC

Post by Tim Rentsch

You don't mention bit-fields of length zero, which prevent any
further packing in the current addressable unit (assuming it
is partially filled). How does this capability affect your
assessment?

OK, I had missed that hack to enforce 'not packing'. However I really
dislike (almost wrote 'hate') this kind of thing. The intent of a
programmer should be clearly readable in the code. It also means that
the programmer must decide whether it will give him more speed but more
space and the result is clearly non-portable.

Francis

Francis

s***@casperkitty.com

2017-05-05 18:40:22 UTC

Post by Tim Rentsch
You don't mention bit-fields of length zero, which prevent any
further packing in the current addressable unit (assuming it
is partially filled). How does this capability affect your
assessment?

Some implementations attached that meaning to bitfields of length zero,
and no notable implementations attached any contrary meaning. If the
Standard were to simply say that implementations can place bitfields
however they see fit, that would leave zero-length bitfields without a
defined meaning. Perhaps it would have been helpful to say that if the
size of a structure containing a zero-length bitfield would be N bytes
if everything following the bitfield were omitted, nothing following
that bitfield may affect anything within the first N bytes of the
structure. That's the one aspect of layout I can see that portable code
should havefdreasontocareabout.gfsdfggsdfg

s***@casperkitty.com

2017-05-05 18:49:48 UTC

Post by Tim Rentsch

You don't mention bit-fields of length zero, which prevent any
further packing in the current addressable unit (assuming it
is partially filled). How does this capability affect your
assessment?

Zero-length bitfields tie in with the one aspect of bitfields which should
be portable; rather than have the Standard impose restrictions on how
implementations can store things which offer no benefit to portable code,
it should have focused on the one guarantee that zero-length bitfields
should need to offer: if the size of a structure with everything following
a zero-length bitfield omitted would be N bytes, then the value of everything
prior to the zero-length bitfield must be stored entirely within the first N
bytes, and the value of everything after must be stored entirely after the
first N bytes. Couple that with enforcement of the Common Initial Sequence
rule and portable code should have no reason to care about any other aspects
of layout.

Tim Rentsch

2017-05-06 07:08:35 UTC

Post by Tim Rentsch

You don't mention bit-fields of length zero, which prevent any
further packing in the current addressable unit (assuming it
is partially filled). How does this capability affect your
assessment?

OK, I had missed that hack to enforce 'not packing'. However I really
dislike (almost wrote 'hate') this kind of thing. The intent of a
programmer should be clearly readable in the code. It also means that
the programmer must decide whether it will give him more speed but
more space and the result is clearly non-portable.

I agree the notation is clunky. Perhaps not the clunkiest of C's
various clunky notations, but certainly high on the list.
Presumably this choice was made for historical reasons.

As far as performance goes, I see the "no more in this unit"
thing as being about layout, not about (speed) performance. If
performance is a key consideration, I think it's almost always
better to avoid bitfields altogether and just do masking and
shifting explicitly (or use standard integer types so masking
and shifting simply aren't needed).

I'm not sure what to make of the comment about non-portability.
I see bitfields as serving two distinct and pretty much mutually
exclusive purposes. One is to match layout at the bit level to
some externally specified format, such as a device register or
something like that. This use is inherently non-portable. The
other purpose is a convenience factor for limiting range and
using space more efficiently. This use is portable in terms of
semantics (taking into account such things as how big the various
integer types are), and speed/space tradeoffs can be tuned by
choosing which members to make bitfields and which to make
unsigned char or some other standard integer type. Of course for
different platforms we will want to make different choices for
those, but that is always true when performance is critical.
What flexibility do you want that C does not currently provide?
Or more specifically, what additional features would you
propose to facilitate that?

Francis Glassborow

2017-05-06 10:25:21 UTC

<snip/>

Post by Tim Rentsch
I'm not sure what to make of the comment about non-portability.
I see bitfields as serving two distinct and pretty much mutually
exclusive purposes. One is to match layout at the bit level to
some externally specified format, such as a device register or
something like that. This use is inherently non-portable.

Yes, agreed
The

Post by Tim Rentsch
other purpose is a convenience factor for limiting range and
using space more efficiently.

True but these days we are generally only concerned with space saving
when dealing with micro-controllers. In general I would expect to use a
dedicated compiler when programming such hardware (not least because the
program is usually held in ROM with its own specific limitations). A
compiler for a micro-processor that did not pack bit-fields as tightly
as possible and make use of the other special features that a specific
micro-controller offers would not be of great use.
But code for a micro-controller almost always tailored to the choice of
hardware and so is generally non-portable.
This use is portable in terms of

Post by Tim Rentsch
semantics (taking into account such things as how big the various
integer types are), and speed/space tradeoffs can be tuned by
choosing which members to make bitfields and which to make
unsigned char or some other standard integer type.

That looks like premature optimisation to me. If I choose a char, then
the compiler has to provide a char regardless of whether it could do
better some other way.
Of course for

Post by Tim Rentsch
different platforms we will want to make different choices for
those, but that is always true when performance is critical.
What flexibility do you want that C does not currently provide?
Or more specifically, what additional features would you
propose to facilitate that?

As I have already said, this whole discussion will not (ever) change a
single line of the Standard. It may have educational benefit but in
practice done is done and if the conforming to the Standard is relevant
than packed bit-fields is what we have.

Francis

Tim Rentsch

2017-05-08 03:39:06 UTC

[...area of agreement...]

[One use case] is a convenience factor for limiting range and
using space more efficiently.

True but these days we are generally only concerned with space
saving when dealing with micro-controllers. [...]

I don't agree with that premise. Admittedly such cases for
"ordinary" platforms (whatever that means these days) are rarer
now than they used to be, but certainly not an endangered
species.

This use is portable in terms of semantics (taking into account
such things as how big the various integer types are), and
speed/space tradeoffs can be tuned by choosing which members to
make bitfields and which to make unsigned char or some other
standard integer type.

That looks like premature optimisation to me. [...]

Premature? Is that really what you mean? I didn't say anything
about when a choice might be made, only that having or making
such choices allows some tuning on the speed vs space spectrum.

Of course for different platforms we will want to make different
choices for those, but that is always true when performance is
critical. What flexibility do you want that C does not currently
provide? Or more specifically, what additional features would
you propose to facilitate that?

As I have already said, this whole discussion will not (ever)
change a single line of the Standard. [...]

Very likely not, but I hope that won't stop you from answering
the question. My interest is in understanding your reaction,
not (at least for now) in whether the Standard will be affected.

s***@casperkitty.com

2017-05-06 15:42:48 UTC

Post by Tim Rentsch
I agree the notation is clunky. Perhaps not the clunkiest of C's
various clunky notations, but certainly high on the list.
Presumably this choice was made for historical reasons.

Some implementations supported bitfields before the Standard was written,
and the authors of the Standard seemed keen to avoid defining any kind of
"optional" syntax (as evidenced by the fact that doesn't define any syntactic
constructs without making them mandatory). Personally, I think that the
Standard should either make bitfields optional or else define a syntax that
would make them portable, e.g.

struct foo {
uint32_t asWord;
lowHalf : asWord.0:16;
upperHalf : asWord.16:16;
lowByte : asWord.0:8;
lowishByte : asWord.8:8;
highishByte : asWord.16:8;
highByte : asWord.24:8;
};

If the new syntax were specified as attaching names to portions of an
earlier-declared member, and bit numbers were specified as starting at
0 for the least-significant bit, a definition like the above would
unambiguously establish which bits went where. If bitfields had to be
tied to primitive types, there would be no need to change the compiler
logic for bit field accesses--merely change the code that processes the
declarations.

Post by Tim Rentsch
I see bitfields as serving two distinct and pretty much mutually
exclusive purposes. One is to match layout at the bit level to
some externally specified format, such as a device register or
something like that. This use is inherently non-portable.

In cases where the format to be matched is something that might be
meaningful among multiple implementations (e.g. data stored in a file),
layouts chould be portable if the Standard allowed a way to actually
specify them.

bartc

2017-05-06 19:23:12 UTC

Some implementations supported bitfields before the Standard was written,
and the authors of the Standard seemed keen to avoid defining any kind of
"optional" syntax (as evidenced by the fact that doesn't define any syntactic
constructs without making them mandatory). Personally, I think that the
Standard should either make bitfields optional or else define a syntax that
would make them portable, e.g.
struct foo {
uint32_t asWord;
lowHalf : asWord.0:16;
upperHalf : asWord.16:16;
lowByte : asWord.0:8;
lowishByte : asWord.8:8;
highishByte : asWord.16:8;
highByte : asWord.24:8;
};
If the new syntax were specified as attaching names to portions of an
earlier-declared member, and bit numbers were specified as starting at
0 for the least-significant bit, a definition like the above would
unambiguously establish which bits went where. If bitfields had to be
tied to primitive types, there would be no need to change the compiler
logic for bit field accesses--merely change the code that processes the
declarations.

I've never used bitfields as they are now, and never will.

My approach would be to use normal types, such as an int matching a
machine word, then superimpose bit and bit-field operations on top.

This can be done now with logic operations (shift, mask etc), or with
macros that do the same.

Your proposal seems along similar lines. It seems simpler to implement too.

Probably, I wouldn't have bothered putting everything in a struct. In
another syntax I use, if X is a 32-bit word, then I would write:

lowHalf is: X.[15..0] # or X.[0..15]; X.[0:16] is another
# possibility
lowByte is: X.[0..7] # I've also used X.lsb

topBit is: X.[31]

I think these were allowed as both rvalues and lvalues.

In C, the syntax could be dressed up as macros:

#define lowByte 0..7 // then X.[lowByte], or:
#define lowByte [0..7] // then X.lowByte

Your proposal has the advantage that these are encapsulated, and
specific to this type; my syntax for bits and bit-fields is general
purpose and could be used anywhere.

--
bartc

s***@casperkitty.com

2017-05-07 18:48:20 UTC

Post by bartc
Your proposal has the advantage that these are encapsulated, and
specific to this type; my syntax for bits and bit-fields is general
purpose and could be used anywhere.

Some form of "cookie-cutter" operator would be helpful; though usually
expressed as "dest = (dest & ~mask) | (newvalue & mask)" the equivalent form
"dest ^= (dest ^ newvalue) & mask" is often slightly easier to implement.

In any case, the Standard essentially requires all conforming compilers be
capable of generating code to read and update arbitrary ranges of bits
within a machine word, but doesn't allow programmers to take much advantage
of that. Compilers are already required to do 90%+ of the work that would
be needed to make bitfields portable, but in a way that reaps less than 10%
of the benefit. Finishing off the feature would thus offer very good "bang
for the buck".

Jakob Bohm

2017-05-10 03:28:17 UTC

Some implementations supported bitfields before the Standard was written,
and the authors of the Standard seemed keen to avoid defining any kind of
"optional" syntax (as evidenced by the fact that doesn't define any syntactic
constructs without making them mandatory). Personally, I think that the
Standard should either make bitfields optional or else define a syntax that
would make them portable, e.g.
struct foo {
uint32_t asWord;
lowHalf : asWord.0:16;
upperHalf : asWord.16:16;
lowByte : asWord.0:8;
lowishByte : asWord.8:8;
highishByte : asWord.16:8;
highByte : asWord.24:8;
};
If the new syntax were specified as attaching names to portions of an
earlier-declared member, and bit numbers were specified as starting at
0 for the least-significant bit, a definition like the above would
unambiguously establish which bits went where. If bitfields had to be
tied to primitive types, there would be no need to change the compiler
logic for bit field accesses--merely change the code that processes the
declarations.

I find your syntax overly verbose and actually clunky.

Here's a simpler alternative:

Define (in the next release of the standard) that the bitfield
allocation must use one of 4 standard algorithms and that limits.h must
set a define in the __STDC_ space identifying which one. The 4
algorithms would be:

0: Place fields at the least significant bit of the enclosing unit
first, the enclosing unit is the largest of unsigned int and the
specified base types of the fields placed therein. Only and always
start a new unit in the following cases:
A: Bit field will not fit in the remainder of the current unit.
B: Bit length is specified as 0, in which case the field may not
be referenced in any way.
C: Next field is not a bit field.
When ending a unit, reduce it's allocated size to the smallest number
of memory cells that will hold all the fields placed therein, even if
that number of memory cells is not the size of any type.

1: As algorithm 0, but most significant bit first.

2: As 0 but enclosing unit is the specified base type of the field, and
also start a new unit whenever changing base type. Also do not
reduce the allocated size of a storage unit, but leave it at the size
of the base type.

3: As 2, but most significant bit first.

Note that using an even algorithm on a big endian machine or an odd
algorithm on a little endian machine will be more complicated, but not
impossible to implement.

The algorithm definitions above should be adjusted to match the actual
behavior of current and past implementations. For example, I don't
know if the implementations whose behavior currently resembles
algorithms 2 and 3 start a new storage unit when only the signedness of
the the base type changes.

Code needing to use a bitfield for access to a portable item (such as a
TCP/IP packet header), can then provide two equivalent but slightly
different layouts for odd and even algorithms while choosing types to
be unaffected by the difference between 0/1 and 2/3 (That difference is
of cause what started this whole thread).

Less portable programs can simply refuse to compile (using the #error
directive) if the algorithm is something unexpected, such as

#if __STDC_BITFIELD_ALG != 0 && defined(SOMEHOW_LITTLE_ENDIAN)
#error "This compiler uses an unsupported bitfield layout"
#elif __STDC_BITFIELD_ALG != 1 && defined(SOMEHOW_BIG_ENDIAN)
#error "This compiler uses an unsupported bitfield layout"
#elif __STDC_BITFIELD_ALG != 2 && defined(SOMEHOW_PDP11_ENDIAN)
#error "This compiler uses an unsupported bitfield layout"
#endif

Agreed

Enjoy

Jakob

--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Keith Thompson

2017-05-10 16:13:26 UTC

Jakob Bohm <jb-***@wisemo.com> writes:
[...]

Post by Jakob Bohm
Define (in the next release of the standard) that the bitfield
allocation must use one of 4 standard algorithms and that limits.h must
set a define in the __STDC_ space identifying which one. The 4

Existing macros whose names start with __STDC are predefined, not
defined in any particular header.

[...]

Post by Jakob Bohm
#if __STDC_BITFIELD_ALG != 0 && defined(SOMEHOW_LITTLE_ENDIAN)

Such macros generally begin *and end* with a double underscore.
I think it would be best to follow that convention.

[...]

Jakob Bohm

2017-05-12 05:17:44 UTC

Existing macros whose names start with __STDC are predefined, not
defined in any particular header.

OK, is there another clearly delineated "namespace" for standard-
specified header-defined macros.

Post by Jakob Bohm
#if __STDC_BITFIELD_ALG != 0 && defined(SOMEHOW_LITTLE_ENDIAN)

Such macros generally begin *and end* with a double underscore.
I think it would be best to follow that convention.

OK, it was just a very rough draft.

Enjoy

Jakob

Philip Lantz

2017-05-14 23:32:10 UTC

Post by Jakob Bohm

Existing macros whose names start with __STDC are predefined, not
defined in any particular header.

OK, is there another clearly delineated "namespace" for standard-
specified header-defined macros.

Is there some reason why this particular macro should be defined in a
header instead of predefined?

Richard Damon

2017-05-15 00:32:32 UTC

Post by Philip Lantz

Post by Jakob Bohm

Existing macros whose names start with __STDC are predefined, not
defined in any particular header.

OK, is there another clearly delineated "namespace" for standard-
specified header-defined macros.

Is there some reason why this particular macro should be defined in a
header instead of predefined?

One advantage of defining that you need to include a header file to be
sure the symbol is defined (but it still be reserved in all cases) is
that it can be added to an implementation by just editing the header file.

s***@casperkitty.com

2017-05-15 16:55:21 UTC

Post by Richard Damon
One advantage of defining that you need to include a header file to be
sure the symbol is defined (but it still be reserved in all cases) is
that it can be added to an implementation by just editing the header file.

It would IMHO be helpful to have the Standard recommend that compilers
include a means of specifying a translation unit should be formed by
combining two files. The most appropriate way of specifying such things
would vary depending upon the implementation, but a distribution that
contains a bunch of C files along with some headers that may be needed
when using various compilers could be recognizable as a C program without
having to edit the C files to add #include directives. If the source
files are under version control, having to add #include directives when
using certain compilers would make it harder to keep source files up to
date. While it might be possible to accompany each C file with another
compiler that should be fed to the preprocessor that would #include the
necessary headers and then a file containing the actual code, that would
be rather more clunky.

Keith Thompson

2017-05-15 18:50:59 UTC

You think the Standard should recommend that compilers implement the
equivalent of the "cat" command.

cat foo1.c foo2.c > foo.c && cc -c foo.c

Kaz Kylheku

2017-05-15 19:37:02 UTC

You think the Standard should recommend that compilers implement the
equivalent of the "cat" command.
cat foo1.c foo2.c > foo.c && cc -c foo.c

Or, just generate the text

#include "foo1.c"
#include "foo2.c"

into foo.c. This way the __FILE__ and __LINE__ will refer to the
correct original source files, and no bulk copying is done.

Kaz Kylheku

2017-05-15 19:34:41 UTC

The problem is that there are numerous conventions by which programs use
header files.

Ultimately, header files are a silly thing, which can be automated
instead of maintained by hand.

See here: http://www.hwaci.com/sw/mkhdr/makeheaders.html

The makeheaders utility allows you to arrange a program so that every .c
file just needs to include the same-named .h file: foo.c includes foo.h,
and so on. The utility scans the code and populates the contents of the
.c files and for each one, generates the .h file which supplies the
declarations of entities in the other .c files which that .c file
needs.

In other words, you can use C somewhat more like a higher level
header-file-free programming language.

Jakob Bohm

2017-05-15 21:38:49 UTC

The standard solution is for your build system to insert the needed
include lines into an always generate, not-version-controlled,
otherwise empty file such as "mysysincludes.h", then having the version
controlled source do simply

#include "mysysincludes.h"

This provides a clear separation of concerns and allows the version
controlled source files to judiciously place other directives
before/after that line, or even omit it completely in source files that
don't need those varying system headers.

For some projects, it may even pay to create multiple such files:

#include "myinc-unistd.h" // unistd.h or suitable bunch of old .h s
#include "myinc-openflgs.h" // Whatever provides the flags for open()
#include "myinc-sockets.h" // Whatever provides the usual inet stuff
// etc.

Note, that for source control purposes, those generated files are
intermediary build files (like .o files), not source code. The thing
under source control is the scripts/data to generate the files.

This standard technique avoids adding cruft to the compiler and
language specification and has worked since before ANSI C.

Enjoy

Jakob

s***@casperkitty.com

2017-05-15 22:11:29 UTC

Post by Jakob Bohm
The standard solution is for your build system to insert the needed
include lines into an always generate, not-version-controlled,
otherwise empty file such as "mysysincludes.h", then having the version
controlled source do simply
#include "mysysincludes.h"
This provides a clear separation of concerns and allows the version
controlled source files to judiciously place other directives
before/after that line, or even omit it completely in source files that
don't need those varying system headers.

If the files that one needs to work with have made such provisions, great.
There are many source code collections that have not made such provisions,
and having an old version of a source file which has such a #include
prepended be considered "newer" than a file that was in every other respect
more recent seems less than helpful.

Incidentally, another recommendation I'd add is that compilers should
include a means of indicating that certain include-file paths should behave
as though symlinked to other paths. If a project is supposed to include
some headers which are located outside it, #include "../../../../x/y/z.h"
may work if the source file is stored with the proper level of nesting in
a directory structure, or if the proper location is included in the list
of include search paths *and no other file with that same name exists
anywhere in that path*, but I'd regard both such approaches as broken
compared with having a means of specifying path aliases. If C had always
processed string concatenation within #include directives, the best way
to handle that might have been to have a convention where each file starts
with

#include "projdirs.h"

and then have other include statements use

#include ACME_X_PATH("y.h")

Unfortunately, many compilers don't provide any means of generating a
concatenated path string. One could concatenate non-string items in the
preprocessor and then stringize them, but that could fail if any path
component matches the name of a macro.

Jakob Bohm

2017-05-15 22:21:49 UTC

If the files that one needs to work with have made such provisions, great.
There are many source code collections that have not made such provisions,
and having an old version of a source file which has such a #include
prepended be considered "newer" than a file that was in every other respect
more recent seems less than helpful.
Incidentally, another recommendation I'd add is that compilers should
include a means of indicating that certain include-file paths should behave
as though symlinked to other paths. If a project is supposed to include
some headers which are located outside it, #include "../../../../x/y/z.h"
may work if the source file is stored with the proper level of nesting in
a directory structure, or if the proper location is included in the list
of include search paths *and no other file with that same name exists
anywhere in that path*, but I'd regard both such approaches as broken
compared with having a means of specifying path aliases. If C had always
processed string concatenation within #include directives, the best way
to handle that might have been to have a convention where each file starts
with
#include "projdirs.h"
and then have other include statements use
#include ACME_X_PATH("y.h")
Unfortunately, many compilers don't provide any means of generating a
concatenated path string. One could concatenate non-string items in the
preprocessor and then stringize them, but that could fail if any path
component matches the name of a macro.

Those examples are all cases of trivial (or not so trivial) bugs in the
source code being compiled.

If this is "upstream" source code, the solution (as for all local
bugfixes and adaptations to upstream source code) is to have tools
(such as any decent version control software) that tracks the changes
in fresh upstream tarballs and local changes while assisting in any
merges of such changes to the same files.

It is no different than if the required change was to fix a
non-portable code line that depending on a vendor-specific function,
such as the BSD bcopy() function.

Enjoy

Jakob

s***@casperkitty.com

2017-05-16 16:10:45 UTC

Post by Jakob Bohm

Post by s***@casperkitty.com
Incidentally, another recommendation I'd add is that compilers should
include a means of indicating that certain include-file paths should behave
as though symlinked to other paths. If a project is supposed to include
some headers which are located outside it, #include "../../../../x/y/z.h"
may work if the source file is stored with the proper level of nesting in
a directory structure, or if the proper location is included in the list
of include search paths *and no other file with that same name exists
anywhere in that path*, but I'd regard both such approaches as broken
compared with having a means of specifying path aliases. If C had always
processed string concatenation within #include directives, the best way
to handle that might have been to have a convention where each file starts
with
#include "projdirs.h"
and then have other include statements use
#include ACME_X_PATH("y.h")
Unfortunately, many compilers don't provide any means of generating a
concatenated path string. One could concatenate non-string items in the
preprocessor and then stringize them, but that could fail if any path
component matches the name of a macro.

Those examples are all cases of trivial (or not so trivial) bugs in the
source code being compiled.

Which examples are "bugs"?

If a C source file needs to make use of an outside library, having the
#include directives include a path which coincides with where that library's
header is on the developer's machine is icky, but what better approach *is*
there, short of either duplicating the library in every project where it's
used (which could in turn cause other problems) or else relying upon
compilers' ability to search through a list of include-file paths?

s***@casperkitty.com

2017-05-12 19:16:48 UTC

Post by Jakob Bohm
I find your syntax overly verbose and actually clunky.

Whether it's overly verbose or clunky depends upon whether code needs the
ability to access all of the bitfields in a word as a group. Since code
wouldn't generally need to care about how bit-fields are laid out unless
it was going to make use of the underlying storage, I'd consider the fairest
comparison would be between something like:

typdef struct
{
uint32_t CTRL;
unsigned enable : CTRL.0:1;
unsigned mode : CTRL.1:3;
unsigned prescalar : CTRL.4:28;
} foo_ctrl_reg;

and something like

typedef struct
{
uint32_t enable: 1;
uint32_t mode : 3;
uint32_t prescalar : 28;
} foo_ctrl_struct;
union
{
uint32_t reg;
foo_bit_struct bits;
} foo_ctrl_reg;

I'd say being able to define the object in which the bit fields will be
stored in the same structure definition as the bit fields themselves would
be nicer than having to use a union to try to emulate such behavior, even
if the bitfield layout is known.

It might be nicer yet to have a means of declaring a type which could be
used as an integer type, but could *also* be accessed via bitfield syntax,
thus allowing something like:

WIDGET1->CTRL = 0x10000;

but also allowing

WIDGET1->CTRL.ENABLE = 1;

That kind of feature would require changes to more of the compiler than
would be necessary merely to allow structure definitions to include bitfield
overlay aliases.

David Brown

2017-05-03 10:47:13 UTC

In the earlier K&R days, the "unit" in the phrase "the same unit" referred
to an object of the specified member type. Thus, on a system with 8/16/32

As far as I can see, in C89/90 (and presumably in K&R C), the only types
allowed in bit-fields were int, unsigned int and signed int. It is only
with C99 and later that other integer types were allowed, and they
should follow the rules as described by Keith.

However, a C89/C90 compiler might allow _Bool and short bitfields as an
extension (_Bool being an extension already in C89/C90), and make its
own rules for packing them.

Kaz Kylheku

2017-05-03 15:13:54 UTC

Post by David Brown
However, a C89/C90 compiler might allow _Bool and short bitfields as an
extension (_Bool being an extension already in C89/C90), and make its
own rules for packing them.

A C99+ implementation that still exposes _Bool in its C90 translation
mode, but changes the bit packing rules for it, would be pretty silly.
Any such struct layout change is a bad idea.

A C90 implementation which has _Bool specifically for a measure of C99
compatibility, to support a <stdbool.h> header and so on, is not
going to make its rules for packing; it wans to be conforming as much
as possible to the rules that apply to any C99 stuff it implements.
If you see this behavior in such an implementation, it's almost
certainly a bug and not a case of "we are providing some C99 things but,
oh, changing them around just to mess with you".

That leaves C90 implementations that have this ugly _Bool-shit thing
just for shits and giggles rather than as a compatibilty measure.
I doubt any exist.

Jakob Bohm

2017-05-04 16:08:46 UTC

Post by Kaz Kylheku

Post by David Brown
However, a C89/C90 compiler might allow _Bool and short bitfields as an
extension (_Bool being an extension already in C89/C90), and make its
own rules for packing them.

A C99+ implementation that still exposes _Bool in its C90 translation
mode, but changes the bit packing rules for it, would be pretty silly.
Any such struct layout change is a bad idea.
A C90 implementation which has _Bool specifically for a measure of C99
compatibility, to support a <stdbool.h> header and so on, is not
going to make its rules for packing; it wans to be conforming as much
as possible to the rules that apply to any C99 stuff it implements.
If you see this behavior in such an implementation, it's almost
certainly a bug and not a case of "we are providing some C99 things but,
oh, changing them around just to mess with you".
That leaves C90 implementations that have this ugly _Bool-shit thing
just for shits and giggles rather than as a compatibilty measure.
I doubt any exist.

I believe classic gcc (e.g. gcc 2.x and older) had a C "boolish" type
as an extension (for some value of the word extension).

Later versions of gcc arbitrarily removed many long standing extensions
for no good reason, I don't recall if GCC's boolean type was one victim
of this.

Enjoy

Jakob

Keith Thompson

2017-05-04 17:01:09 UTC

Jakob Bohm <jb-***@wisemo.com> writes:
[...]

Post by Jakob Bohm
I believe classic gcc (e.g. gcc 2.x and older) had a C "boolish" type
as an extension (for some value of the word extension).

The documentation for gcc 2.8.1, the earliest I have easy access to,
doesn't mention such an extension.

Post by Jakob Bohm
Later versions of gcc arbitrarily removed many long standing extensions
for no good reason, I don't recall if GCC's boolean type was one victim
of this.

One good reason would be to encourage use of the corresponding standard
feature rather than the non-standard extension.

But in at least one case, gcc implemented a form of designated
initializer with a syntax that differed from what C99 later defined:

struct {int x, y;} obj = { x: 10, y: 20 };

gcc 7.1.0, released just a few days ago, warns about this with the right
options, but it stll accepts it by default.

Jakob Bohm

2017-05-04 18:59:56 UTC

Post by Jakob Bohm
I believe classic gcc (e.g. gcc 2.x and older) had a C "boolish" type
as an extension (for some value of the word extension).

The documentation for gcc 2.8.1, the earliest I have easy access to,
doesn't mention such an extension.

Must have misremembered that, sorry.

Post by Jakob Bohm
Later versions of gcc arbitrarily removed many long standing extensions
for no good reason, I don't recall if GCC's boolean type was one victim
of this.

One good reason would be to encourage use of the corresponding standard
feature rather than the non-standard extension.

The cases I was aware of had no corresponding standard feature or had
an *identical* standard feature in a related standard:

- lvalue-casts, as also implemented by at least Borland and Microsoft
compilers at the time. There is no corresponding standard feature in
all cases. (Consider (short)(unsshortarray[i++]) /= -10; )

- Using C++ semantics for the ?: and , operators when compiling as
GNU89 instead of C89.

Post by Keith Thompson
But in at least one case, gcc implemented a form of designated
struct {int x, y;} obj = { x: 10, y: 20 };
gcc 7.1.0, released just a few days ago, warns about this with the right
options, but it stll accepts it by default.

Enjoy

Jakob

Keith Thompson

2017-05-03 15:33:02 UTC

David Brown <***@hesbynett.no> writes:
[...]

Post by David Brown
As far as I can see, in C89/90 (and presumably in K&R C), the only types
allowed in bit-fields were int, unsigned int and signed int. It is only
with C99 and later that other integer types were allowed, and they
should follow the rules as described by Keith.

C90 says:

A bit-field shall have a type that is a qualified or unqualified
version of one of int, unsigned int, or signed int.

C11 says:

A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.

I believe C99 is the same as C11.

Post by David Brown
However, a C89/C90 compiler might allow _Bool and short bitfields as an
extension (_Bool being an extension already in C89/C90), and make its
own rules for packing them.

Yes, but a conforming C90 implementation must still issue a
diagnostic for a bit-field of a type other than int, unsigned int,
or signed int, since it's a constraint violation. (It can support
other types without a diagnostic in non-conforming mode.)

s***@casperkitty.com

2017-05-03 17:14:23 UTC

Post by Keith Thompson
Yes, but a conforming C90 implementation must still issue a
diagnostic for a bit-field of a type other than int, unsigned int,
or signed int, since it's a constraint violation. (It can support
other types without a diagnostic in non-conforming mode.)

If documented extensions are not allowed to override what would *otherwise*
be constraints, what is the purpose of the rule requiring that all extensions
be documented? An implementation that happens to do something useful if code
violates some particular constraint would be under no obligation to document
that fact if it also issued a diagnostic. From the point of view of the
Standard, in what situation could failure to document an extension make an
implementation non-conforming, *except* in cases where the implementation
failed to issue a diagnostic that would otherwise have been required?

Kaz Kylheku

2017-05-03 17:34:21 UTC