Help on this typedef

Post by fl
Hi,
I see this typedef, which has '[1]' while the below declaration without it.
What use is '[1]'?
Thanks
-----------------
typedef __mpz_struct mpz_t[1];

This is saying that an mpz_t is an array of __mpz_struct's of length 1.

Post by fl
mpz_t s_divisor;

Because of the typedef, this is like saying

__mpz_struct s_divisor[1];

Keith Thompson

2017-05-11 03:11:56 UTC

Post by fl
I see this typedef, which has '[1]' while the below declaration without it.
What use is '[1]'?
Thanks
-----------------
typedef __mpz_struct mpz_t[1];

It makes "mpz_t" an alias for the type "__mpz_struct[1]", an array of 1
__mpz_struct.

Post by fl
mpz_t s_divisor;

This defines an object named s_divisor of type mpz_t.
The declaration is equivalent to:

__mpz_struct s_divisor[1];

An array of a single element of a given type is similar to single object
of that same type, but the type and the available operations are
different.

I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics. A function that takes an argument of type
mpz_t:

void func(mpz_t param);

is exactly equivalent to:

void func(__mpz_struct param[1]);

which in turn is exactly equivalent to:

void func(__mpz_struct *param);

If you have questions about arrays and pointers, read section 6 of the
comp.lang.c FAQ, <http://www.c-faq.com/>.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

David Brown

2017-05-11 08:03:52 UTC

Post by fl
I see this typedef, which has '[1]' while the below declaration without it.
What use is '[1]'?
Thanks
-----------------
typedef __mpz_struct mpz_t[1];

It makes "mpz_t" an alias for the type "__mpz_struct[1]", an array of 1
__mpz_struct.

Post by fl
mpz_t s_divisor;

This defines an object named s_divisor of type mpz_t.
__mpz_struct s_divisor[1];
An array of a single element of a given type is similar to single object
of that same type, but the type and the available operations are
different.
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

s***@casperkitty.com

2017-05-11 15:35:26 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

Using a literal 1 within a typedef in that case would seem unusual. There
are some situations where the pseudo-pass-by-reference can be useful to
kludge around some of C's limitations. While such usage isn't especially
common it's probably more common than any *other* use for size-1 array
typedefs.

David Brown

2017-05-12 08:34:22 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

Using a literal 1 within a typedef in that case would seem unusual.

It would be better to have:

#define noOfMPZs 1

and use that rather than "1" in the typedef.

But we don't know the origin of the code, or how well thought-out it is,
or what might have been put in as a quick hack. We are just guessing.

Post by s***@casperkitty.com
There
are some situations where the pseudo-pass-by-reference can be useful to
kludge around some of C's limitations. While such usage isn't especially
common it's probably more common than any *other* use for size-1 array
typedefs.

j***@gmail.com

2017-05-12 13:01:16 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

Using a literal 1 within a typedef in that case would seem unusual.

#define noOfMPZs 1
and use that rather than "1" in the typedef.

In this context "1" has deep meaning.

Post by David Brown
But we don't know the origin of the code, or how well thought-out it is,
or what might have been put in as a quick hack. We are just guessing.

(sigh)

Let me help you:

https://www.google.co.jp/search?q=__mpz_struct

--
Joel Rees

Twiddling my pi:
http://joels-programming-fun.blogspot.jp/2016/11/using-gmp-to-approximate-pi.html
https://ja.osdn.net/users/reiisi/pastebin/4462

Keith Thompson

2017-05-12 16:38:29 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

Using a literal 1 within a typedef in that case would seem unusual.

#define noOfMPZs 1
and use that rather than "1" in the typedef.
But we don't know the origin of the code, or how well thought-out it is,
or what might have been put in as a quick hack. We are just guessing.

It's from GMP, the GNU multiple precision arithmetic library.

Quoting its documentation:

When a GMP variable is used as a function parameter, it's
effectively a call-by-reference, meaning if the function
stores a value there it will change the original in the caller.
Parameters which are input-only can be designated `const' to
provoke a compiler error or warning on attempting to modify them.

When a function is going to return a GMP result, it should
designate a parameter that it sets, like the library functions
do. More than one value can be returned by having more than one
output parameter, again like the library functions. A `return'
of an `mpz_t' etc doesn't return the object, only a pointer,
and this is almost certainly not what's wanted.

(An alternative might have been to make mpz_t a small struct, and
require explicit pointers to simulate pass-by-reference.)

j***@gmail.com

2017-05-12 12:08:01 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

An indefinite array type has more limitations than a type of an array with one element.

Both, of course, can refer to an array of more than one element when instantiated.

Tim Rentsch

2017-05-13 21:25:42 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

An indefinite array type has more limitations than a type of an array with one element.

In some ways it has more, in other ways it has fewer.

Post by j***@gmail.com
Both, of course, can refer to an array of more than one element when instantiated.

What do you mean? AFAIK there is no way to "instantiate" an
indefinite array type.

j***@gmail.com

2017-05-15 06:36:22 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

An indefinite array type has more limitations than a type of an array with one element.

In some ways it has more, in other ways it has fewer.

Post by j***@gmail.com
Both, of course, can refer to an array of more than one element when instantiated.

What do you mean? AFAIK there is no way to "instantiate" an
indefinite array type.

Bad phrasing, I guess. "... when in use." probably would have been
more accurate.

--
Joel Rees

Randomly ranting:
http://reiisi.blogspot.com

Tim Rentsch

2017-05-15 14:23:17 UTC

[...]

Post by j***@gmail.com
An indefinite array type has more limitations than a type of an array with one element.

In some ways it has more, in other ways it has fewer.

Post by j***@gmail.com
Both, of course, can refer to an array of more than one element when instantiated.

What do you mean? AFAIK there is no way to "instantiate" an
indefinite array type.

Bad phrasing, I guess. "... when in use." probably would have been
more accurate.

Ah. In that case I must demur regarding part of your later
statement. An access through an array type of extent 1 is
not definedly allowed to access elements after the element
at index 0.

j***@gmail.com

2017-05-18 14:40:56 UTC

Post by Tim Rentsch
[...]

Post by j***@gmail.com
An indefinite array type has more limitations than a type of an
array with one element.

In some ways it has more, in other ways it has fewer.

Post by j***@gmail.com
Both, of course, can refer to an array of more than one element when instantiated.

What do you mean? AFAIK there is no way to "instantiate" an
indefinite array type.

Bad phrasing, I guess. "... when in use." probably would have been
more accurate.

Ah. In that case I must demur regarding part of your later
statement. An access through an array type of extent 1 is
not definedly allowed to access elements after the element
at index 0.

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;

And before you C++ types came around, we just left the array size
out: char text[]. But that was declared in all cases to be equivalent
to char *, so it means something else, now. And if that's what you
mean in such a header, you say char *, instead.

Now, to forestall your initial complaints, I'll point out that this
header is not intended to be directly allocated with malloc() and
sizeof like most structures are allocated:

#if defined DO_IT_WRONG
malloc( sizeof (string_header_t) ) Don't do this!
#endif

One could allocate it like this:

string_header_t string_alloc( size_t length )
{ return malloc( length + sizeof (string_header_t) );
}

particularly if you mean length to include the trailing NUL.

Usually, though, you use this kind of header with your own
allocation. You usually set up your own string pool and manage
it yourself:
-------------------
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct string_header_s
{ short length;
char string[ 1 ];
} string_header_t;

char bigblock[ 100000 ];
char * here = bigblock;

string_header_t * string_allocate( long length )
{ char * place = here;;
if ( ( place = here + length + sizeof (string_header_t) ) >= bigblock + 100000 )
{ return NULL;
}
here = place;
( (string_header_t *) place )->length = length;
return (string_header_t *) place;
}

string_header_t * string_save( char string[] )
{ long length = strlen( string );
string_header_t * headerp = string_allocate( length );
memcpy( headerp->string, string, length );
headerp->string[ length ] = '\0';
return headerp;
}

void print_string( string_header_t * header )
{ int i;
for ( i = 0; i < header->length; ++i )
{ putchar( header->string[ i ] );
}
}

int main ( int argc, char * argv[] )
{ string_header_t * thing;

thing = string_save( "hello" );

/* ... */

print_string( thing );
putchar( '\n' );

return EXIT_SUCCESS;
}
----------------------

--
Joel Rees

Delusions of being a novelist:
http://reiisi.blogspot.com/p/novels-i-am-writing.html

Keith Thompson

2017-05-18 16:35:26 UTC

[...]

Post by Tim Rentsch
Ah. In that case I must demur regarding part of your later
statement. An access through an array type of extent 1 is
not definedly allowed to access elements after the element
at index 0.

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;

That's called the "struct hack", and it was always of questionable
validity. You can likely get away with allocating extra memory and
referring to, for example, obj.text[10], but the behavior is undefined.

C99 introduced flexible array members, described in N1570 6.7.2.1p18,
as a well-defined replacement for the struct hack. See also question
2.6 of the comp.lang.c FAQ, <http://c-faq.com/>.

Post by j***@gmail.com
And before you C++ types came around, we just left the array size
out: char text[]. But that was declared in all cases to be equivalent
to char *, so it means something else, now. And if that's what you
mean in such a header, you say char *, instead.

Prior to C99's instroduction of flexible array members, defining a
struct member as `char text[]` was invalid.

And no, `char text[]` is in no way equivalent to `char *text`. See
section 6 of the comp.lang.c FAQ.

You can define a pointer member, but then you have to allocate memory
for it to point to. With an array member, the memory is allocated
within the struct object itself.

[...]

s***@casperkitty.com

2017-05-18 17:59:08 UTC

Post by Keith Thompson
That's called the "struct hack", and it was always of questionable
validity. You can likely get away with allocating extra memory and
referring to, for example, obj.text[10], but the behavior is undefined.

The Standard never defined the behavior, but implementers back then
recognized that features which were useful should be supported on all
platforms where they were practical, without regard for whether the
Standard mandated such support. The construct was widespread well before
C89 was published (though a lot of code used a superior version where the
array size was given as zero until the standard outlawed that) and I
would regard as ludicrous the idea that the failure of the Standards
Committee to recognize it should have caused compilers to withdraw support.

j***@gmail.com

2017-05-19 06:40:47 UTC

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;

You don't believe in reading before you spout, I take it.

Go back, read the source you clipped and tell me that again.

If you prove that you have read and understood the source that you
say is undefined, I'll listen to you.

Otherwise, <shrug>.

Post by Keith Thompson
[...]
--
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

--
Joel Rees

Randomly ranting:
http://reiisi.blogspot.com

j***@gmail.com

2017-05-19 15:12:49 UTC

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;

You don't believe in reading before you spout, I take it.
Go back, read the source you clipped and tell me that again.
If you prove that you have read and understood the source that you
say is undefined, I'll listen to you.
Otherwise, <shrug>.

<snip>

If you want to have some kind of fat pointer, like a string referenced with
a char* with a built-in size, you can still do that, but the key is to make
sure the alignment for the pointer to your string is valid for the maximum
alignment for your implementation.

In C11, this is defined in max_align_t, but in versions before, you'd have
to sizeof an ugly union that covers the unique range of basic types. It's
basically what malloc does to guarantee that any allocation is properly
aligned for any given type.

So, the key is to create a blob that's of size max_align_t or equivalent
sizeof union type (or a multiple of maxalign_t if there's a lot of fat),
and add that to the allocation of the object you wish to allocate (in
this case a string). Then in your allocation, you allocate the total
chunk of memory, but return a pointer at an offset that points to your
object. You would return a pointer at the memory address location
referenced by '*'.

^ -> *
| size | string |
0 8 (or 16)

If it's a string that has a length field, and max_align_t is 8 bytes, then
you would have to add 8 bytes to every allocation to guarantee alignment
for any generalized type. The nice thing is that you can still pass your
fat pointer string object to standard library functions and it'll still
work. You just need to be careful when freeing that you go back 8 bytes
to free the actual address that malloc gave you or you'll corrupt the
allocator. You'd have to be very careful with functions like strdup as
you'll end up mixing fat and regular pointers.

To store your length, you'd cast the address referenced by '^' to your type
of choice and set the value, e.g. *((size_t*)p) = strlen (string).

Using this technique, you can do some pretty cool stuff, like introduce
an allocator to artificially run out of memory at a given threshold, or if
you're storing the sizes of heap allocations, keep statistics of your
memory usage, (i.e. if the size of the object is attached to the pointer,
you know how much is released when you free). The caveat is that the
compiler gives you no notice if you happen to use the wrong pointer at
the wrong time, so you can have normal looking code that's really messed
up, i.e. don't free '*', free '^', make sure you set the size at '^'.

^ -> *
| size | string |
0 8

If you package it up in proper 'alloc' and 'free' functions, it's
*relatively* safe to use. It's fairly easy to get bitten though and
introduce very hard to find errors (imo pointer corruption is probably
the worst kind of problem to track down).

\code
size_t current_memory = 0;

void* track_malloc( size_t size )
{
void* p = NULL;
void* mem;

/*
* To track the amount of used memory, each memory allocation is
* prefixed with a size_t object that allows the free function to
* update 'current_memory' correctly when releasing memory.
*/
mem = malloc( c_maxalign_sizeof (size_t) + size );

if ( mem )
{
*((size_t*)mem) = size;
p = (unsigned char*)mem + c_maxalign_sizeof (size_t);

current_memory += size;
}

return p;
}

void track_free( void* p )
{
void* mem = NULL;
size_t p_size;

if ( p )
{
mem = (unsigned char*)p - c_maxalign_sizeof (size_t);
p_size = *((size_t*)mem);

free( mem );

current_memory -= p_size;
}
}
\endcode

You'll get varying opinions about how kosher this type of pointer hacking
is, but it is useful in certain contexts.

Best regards,
John D.

s***@casperkitty.com

2017-05-19 15:41:35 UTC

Post by j***@gmail.com
If you want to have some kind of fat pointer, like a string referenced with
a char* with a built-in size, you can still do that, but the key is to make
sure the alignment for the pointer to your string is valid for the maximum
alignment for your implementation.

That's the approach used by the BSTR type in Windows. Note that alignment
won't be an issue if either:

-1- The processor can handle unaligned reads (as with processors running
Windows), or

-2- Code that reads the length assembles it out of bytes

Except for really short strings, the time required to assemble the length
from multiple byte reads would be much longer than the time required to
use a single word read, but less than the time required to check each byte
of a string against zero.

If portability to machines that don't accommodate unaligned reads were an
issue, I'd favor a variable-size encoding for the length (since short
strings are more common than long strings, a byte read and bit test may
be faster than four byte reads and three shifts) but if code will only be
run on platforms that support unaligned reads, the performance advantages
of being able to use a single 32-bit load may be sufficient to justify the
waste of space on short strings.

A more interesting question is whether it's better to have a string pointer
identify the character storage directly, or allow for the possibility that
it might either target a string or a descriptor object. Allowing for the
latter would add a little overhead to code which processes strings, but
make it possible for a pointer to encapsulate a reference to either an
entire string object or an arbitrary substring thereof.

j***@gmail.com

2017-05-19 16:41:37 UTC

That's the approach used by the BSTR type in Windows. Note that alignment
-1- The processor can handle unaligned reads (as with processors running
Windows), or
-2- Code that reads the length assembles it out of bytes
Except for really short strings, the time required to assemble the length
from multiple byte reads would be much longer than the time required to
use a single word read, but less than the time required to check each byte
of a string against zero.
If portability to machines that don't accommodate unaligned reads were an
issue, I'd favor a variable-size encoding for the length (since short
strings are more common than long strings, a byte read and bit test may
be faster than four byte reads and three shifts) but if code will only be
run on platforms that support unaligned reads, the performance advantages
of being able to use a single 32-bit load may be sufficient to justify the
waste of space on short strings.
A more interesting question is whether it's better to have a string pointer
identify the character storage directly, or allow for the possibility that
it might either target a string or a descriptor object. Allowing for the
latter would add a little overhead to code which processes strings, but
make it possible for a pointer to encapsulate a reference to either an
entire string object or an arbitrary substring thereof.

The big problem with micro optimization to me is that it's really hard to
get the packaging right to a point where you increase the performance without
dramatically increasing the cost of reduced maintain-ability of the source
code. The danger is spending a lot of effort on writing fragile code that
no one wants to maintain even if it's "fast".

Best regards,
John D.

Keith Thompson

2017-05-19 16:34:04 UTC

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;

You don't believe in reading before you spout, I take it.

I do believe in it, though I don't always succeed. I have occasionally
made the mistake of posting without reading and understanding an entire
article, and when I do that I appreciate having it pointed out.

Post by j***@gmail.com
Go back, read the source you clipped and tell me that again.

Ok, I've re-read your previous article and I'll tell you that again.
The behavior is undefined.

Here's a demo program, based on your declaration above with the syntax
error corrected:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
typedef struct string_header_s {
short length; /* limited length */
char text[ 1 ];
} string_header_t;

string_header_t *p = malloc(sizeof *p + 10); /* more than we need */
if (p == NULL) exit(EXIT_FAILURE);
p->length = 4;
strcpy(p->text, "abcd");
printf("p->text[3] = '%c'\n", p->text[3]);
}

When I compile and run this on my system, I get no compile-time
diagnostics and the output, as expected, is:

p->text[3] = 'd'

But both the strcpy() call and the reference to p->text[3] have
undefined behavior. Concentrating on the latter, the indexing
operator [] is defined in terms of pointer addition, in this case
(p->text + 3). The description of pointer addition, in N1570
6.5.6p8, says:

If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.

The array object in question is p->text, which is of type char[1].
We've used malloc to allocate additional memory past the end of
that array object, but it's not part of the array object.

Most, perhaps all, C compilers will accept this and generate code
that behaves as expected, precisely because the struct hack is
a common idiom. Nevertheless, the behavior is not defined by
the standard.

If you disagree, I'm paying attention.

s***@casperkitty.com

2017-05-19 17:18:16 UTC

Post by Keith Thompson
p->text[3] = 'd'
But both the strcpy() call and the reference to p->text[3] have
undefined behavior. Concentrating on the latter, the indexing
operator [] is defined in terms of pointer addition, in this case
(p->text + 3). The description of pointer addition, in N1570
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.

When objects are nested, the C Standard is often unclear about whether
the term "object" refers to an inner object or the containing object.
Given, e.g.

int foo[4][3];
int *p = foo+2;

the C89 Standard could be read in such fashion as to suggest that p
could only be used to access integers within foo[2], but:

1. There are times when it would be useful or vital to be able to take
such a pointer and be able to access other elements of "foo".

2. If "p" were only usable to access integers within foo[2], the
Standard would define no means of producing a pointer that could
access all of "foo".

3. Compiler practice had, so far as I can tell, unanimously supported
the ability to use "p" to access all elements within "foo", and there
is no evidence that the authors of C89 intended to change that.

Under C89, a pointer received via malloc essentially behaves as a
"char[size]" which--for purposes of aliasing--also behaves as a union of
all types that would fit in that amount of space.

Thus, given:

struct s {int x; char z[1];} *p;
p=malloc(1000);

if offsetof(struct s,z) is 4, then p->z will be a pointer to the fifth byte
of a char[1000].

Ben Bacarisse

2017-05-19 20:35:04 UTC

Keith Thompson <kst-***@mib.org> writes:
<snip>

Post by Keith Thompson
Here's a demo program, based on your declaration above with the syntax
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
typedef struct string_header_s {
short length; /* limited length */
char text[ 1 ];
} string_header_t;
string_header_t *p = malloc(sizeof *p + 10); /* more than we need */
if (p == NULL) exit(EXIT_FAILURE);
p->length = 4;
strcpy(p->text, "abcd");
printf("p->text[3] = '%c'\n", p->text[3]);
}
When I compile and run this on my system, I get no compile-time
p->text[3] = 'd'
But both the strcpy() call and the reference to p->text[3] have
undefined behavior. Concentrating on the latter, the indexing
operator [] is defined in terms of pointer addition, in this case
(p->text + 3). The description of pointer addition, in N1570
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.
The array object in question is p->text, which is of type char[1].
We've used malloc to allocate additional memory past the end of
that array object, but it's not part of the array object.

I think something might be missing from the standard in the area. I
think no one disputes that this is defined behaviour:

int *ip = malloc(10);
if (ip) ip[3] = 42;

but what is the array in question here? The storage allocated by malloc
has no effective type (yet) so how can ip and ip+3 point to elements of
the same array? (I've used and int * because there are special
dispensations for accessing any objects byte as a sequence of chars.)

And does that mean we can make the struct hack valid like this:

p->text[3] = 'x'; // invalid
char *cp = (void *)p;
cp += offsetof(sting_header_t, text);
cp[3] = 'x'; // this one OK?

I'd say yes and I think my answer would stay the same if the 'text'
member and the pointer access where to int types rather than char.
(You'd need a char * to do the first offsetof addition, of course.)

<snip>

--
Ben.

s***@casperkitty.com

2017-05-19 21:16:06 UTC

Post by Ben Bacarisse
I think something might be missing from the standard in the area. I
int *ip = malloc(10);
if (ip) ip[3] = 42;
but what is the array in question here? The storage allocated by malloc
has no effective type (yet) so how can ip and ip+3 point to elements of
the same array? (I've used and int * because there are special
dispensations for accessing any objects byte as a sequence of chars.)

Under C89, In the absence of aliasing rules, there would be no problem
with describing malloc(N) as returning a pointer to a "char[N]" which
will happen to be aligned suitably for any type that might be contained
therein. Any pointers derived from that pointer could be indexed in
any fashion that would continue to point within that original array
object.

Ike Naar

2017-05-19 21:15:34 UTC

Post by Ben Bacarisse
I think something might be missing from the standard in the area. I
int *ip = malloc(10);
if (ip) ip[3] = 42;

If sizeof (int) > 2 the assignment to ip[3] writes beyond the
allocated space.

Ben Bacarisse

2017-05-19 22:54:05 UTC

Post by Ike Naar

Post by Ben Bacarisse
I think something might be missing from the standard in the area. I
int *ip = malloc(10);
if (ip) ip[3] = 42;

If sizeof (int) > 2 the assignment to ip[3] writes beyond the
allocated space.

Rats! That was not, of course, intended to be the point! I copied the
10 from the earlier example and forgot to check it. Assume small ints
or imagine I wrote malloc(100).

--
Ben.

Keith Thompson

2017-05-19 21:25:56 UTC

Ben Bacarisse <***@bsb.me.uk> writes:
[...]

N1570 7.22.3p1:

The pointer returned if the allocation succeeds is suitably
aligned so that it may be assigned to a pointer to any type of
object with a fundamental alignment requirement and then used
to access such an object or an array of such objects in the
space allocated (until the space is explicitly deallocated).

The wording does seem to imply that the pointer may be used
to access "such an object or an array of such objects" merely
*because* it's suitably aligned, which I'm not comfortable with,
but the intent seems to be that the allocation functions can and
do create accessible objects.

So a pointer to malloc()ed memory can be treated as a pointer to an
object or array of arbitrary types, but I don't think that implies that
an array member within a structure can be treated as if it were a bigger
array.

Post by Ben Bacarisse
p->text[3] = 'x'; // invalid
char *cp = (void *)p;
cp += offsetof(sting_header_t, text);
cp[3] = 'x'; // this one OK?
I'd say yes and I think my answer would stay the same if the 'text'
member and the pointer access where to int types rather than char.
(You'd need a char * to do the first offsetof addition, of course.)
<snip>

I'd say that doesn't take advantage of the explicit permission granted
for malloc() and friends, so the behavior is undefined.

s***@casperkitty.com

2017-05-19 22:18:14 UTC

Post by Keith Thompson
The pointer returned if the allocation succeeds is suitably
aligned so that it may be assigned to a pointer to any type of
object with a fundamental alignment requirement and then used
to access such an object or an array of such objects in the
space allocated (until the space is explicitly deallocated).
The wording does seem to imply that the pointer may be used
to access "such an object or an array of such objects" merely
*because* it's suitably aligned, which I'm not comfortable with,
but the intent seems to be that the allocation functions can and
do create accessible objects.

Compilers had certainly treated such constructs that way prior to C89,
and I see no reason to believe the C89 authors intended--or even imagined--
that any compiler writers would interpret the lack of an explicit mandate
for such behavior as justification to do otherwise.

Post by Keith Thompson
So a pointer to malloc()ed memory can be treated as a pointer to an
object or array of arbitrary types, but I don't think that implies that
an array member within a structure can be treated as if it were a bigger
array.

Consider the declaration:

union { char small[2]; char big[1000];} *up;

Is there anything in C89 that would suggest that (char*)up->small,
(char*)up->big, and (char*)up would not all be equivalent expressions
of type "char*"? Bear in mind that a pointer to a union is equivalent
to a pointer to any of its members when suitably converted.

Now consider:

struct small_s { char x[2]; };
struct big_s { char x[2000]; };
union uu { struct small_s s; struct big_s b; } *up2;

Is there anything in C89 that would suggest that (char*)up2->s.x,
(char*)up2->b.x, and (char*)up2 would not all be equivalent expressions
of type "char*"? Bear in mind that a pointer to a struct is equivalent
to a pointer to its first member when suitably converted.

If one adds to the above declarations:

struct small_s *ssp;

would anything in C89 suggest that (char*)(ssp->x) and
(char*)((union uu*)ssp)->b.x would not be equivalent expressions of
type "char*"?

If a compiler can't see how pointer *ssp was created, would it have any
basis under C89 for assuming that (char*)(ssp->x) could not be a pointer
to char[2000]?

Ben Bacarisse

2017-05-19 22:52:12 UTC

The pointer returned if the allocation succeeds is suitably
aligned so that it may be assigned to a pointer to any type of
object with a fundamental alignment requirement and then used
to access such an object or an array of such objects in the
space allocated (until the space is explicitly deallocated).
The wording does seem to imply that the pointer may be used
to access "such an object or an array of such objects" merely
*because* it's suitably aligned, which I'm not comfortable with,
but the intent seems to be that the allocation functions can and
do create accessible objects.

It falls just short of saying that it is an array. I'm sure we are
supposed to assume the the space returned by malloc is an array as far
as the explanation of pointer arithmetic is concerned, but I think it
could be more explicit.

Absolutely. I just wondered where the array is that ip and ip+3 point
into in my example.

I'd say that doesn't take advantage of the explicit permission granted
for malloc() and friends, so the behavior is undefined.

I'm not entirely sure of your objection. Is it that p, converted to a
char *, does not point to the while object that malloc returned but only
to the object that p points too?

--
Ben.

Keith Thompson

2017-05-19 23:42:55 UTC

It seems clear to me that it really is an array. Otherwise, "then used
to access such an object or an array of such objects in the space
allocated" wouldn't make any sense.

Absolutely. I just wondered where the array is that ip and ip+3 point
into in my example.

I'd say that doesn't take advantage of the explicit permission granted
for malloc() and friends, so the behavior is undefined.

I'm not entirely sure of your objection. Is it that p, converted to a
char *, does not point to the while object that malloc returned but only

(whole object)

Post by Ben Bacarisse
to the object that p points too?

Yes. You can treat the space allocated by malloc() as an object or as
an array of objects. That doesn't, I think, imply anything about
subsets of that memory.

(I'm not at all confident that I understand this correctly.)

Ben Bacarisse

2017-05-20 00:16:46 UTC

It seems clear to me that it really is an array. Otherwise, "then used
to access such an object or an array of such objects in the space
allocated" wouldn't make any sense.

I could argue that I accept that it can become an array -- after all the
effective type rules say the type the type of the object *becomes* that of
the effective type of the expression use to store data into. That would
give some reason to talk about accessing an array in the space
allocated. In my example, no type had yet been determined.

But this point seems trivial. I certainly agree on what is intended. I
just expected it to be simpler to find words that explain what array ip
and ip+4 point into.

Absolutely. I just wondered where the array is that ip and ip+3 point
into in my example.

I'd say that doesn't take advantage of the explicit permission granted
for malloc() and friends, so the behavior is undefined.

I'm not entirely sure of your objection. Is it that p, converted to a
char *, does not point to the while object that malloc returned but only

(whole object)

Post by Ben Bacarisse
to the object that p points too?

Yes. You can treat the space allocated by malloc() as an object or as
an array of objects. That doesn't, I think, imply anything about
subsets of that memory.

So after

int *p = malloc(100);
char *cp = (void *)p;

cp now only points to one int and cp[sizeof (int) + 1] constructs an
invalid pointer?

Post by Keith Thompson
(I'm not at all confident that I understand this correctly.)

--
Ben.

Tim Rentsch

2017-05-20 08:58:39 UTC

Post by Ben Bacarisse
<snip>

I wasn't sure where to jump into this thread so here seems
as good a place as any.

Post by Ben Bacarisse
I think something might be missing from the standard in the area. I
int *ip = malloc(40); // [earlier size bug corrected]
if (ip) ip[3] = 42;
but what is the array in question here? The storage allocated by malloc
has no effective type (yet) so how can ip and ip+3 point to elements of
the same array? (I've used and int * because there are special
dispensations for accessing any objects byte as a sequence of chars.)

Assuming 'sizeof (int) == 4', this malloc call returns a pointer
to a space that can be used as an array of 10 ints. Given that,
the index operation, and subsequent assignment operation, have
defined behavior. Asking "what is the array" is not important,
because the relevant passage for malloc(), etc, gives the right
to use the returned pointer value in the same way that a pointer
to the start of an array could be used.

Post by Ben Bacarisse
/*[previously]*/ string_header_t *p = malloc(sizeof *p + 10);
p->text[3] = 'x'; // invalid
char *cp = (void *)p;
cp += offsetof(sting_header_t, text);
cp[3] = 'x'; // this one OK?
I'd say yes

In my view there is no question that the behavior involving 'cp'
is defined, actually for two different reasons. The more subtle
reason I will discuss below. The more obvious reason is that for
any object, including the implied array object pointed to by 'p',
it's always allowed to convert a pointer to the start of the
object to a pointer-to-character type, and access the object as
an array of bytes. (Some people might quibble about having to
increment pointers one-by-one to access the characters, but I'm
going to ignore that since it doesn't pass the laugh test.)
Surely the key provision in the first paragraph regarding memory
management functions, ie, the one that says the returned pointer
from malloc() and friends may be

used to access such an object or an array of such objects in
the space allocated

is meant to include the ability to access the same space as a
character array. The operations on 'cp' are allowed under that
umbrella.

Post by Ben Bacarisse
and I think my answer would stay the same if the 'text'
member and the pointer access where to int types rather than char.
(You'd need a char * to do the first offsetof addition, of course.)

Here we get to the heart of the matter, and the more subtle
reasoning alluded to above. Again I believe the behavior
is defined (with a very small asterisk covering a DS9000
type implementation, as explained further below).

Suppose first, as will most commonly be true, that the offset of
the int member in question is a multiple of sizeof (int). We are
allowed to take the return value of malloc() and convert it to
an 'int *', and use that to access an array of ints. Casting the
pointer 'p' to (void *) gives that same value back (please no
quibbles about "same" versus "equal"), which lets us use that
value to access an array of ints. So something like this

int *ip = (void *)p;
ip += offsetof(string_header_t, ints) / sizeof (int);
ip[3] = 42;

has to work. (Using a char * initially and using a byte offset
to increment that pointer is not an important difference.)

Let me say this again more directly, to convey the impact of what
the Standard says about malloc() values. We may do this:

void *v = malloc( 10000 );
int *ip = v;
long *lp = v;
float *fp = v;
string_header_t *shp = v;

after which /all/ of these pointers may be used to access arrays
of their respective types. As long as we don't run afoul of
effective type rules, pointers to space returned by malloc() may
be freely mixed and matched, intermingled, converted to char *
and adjusted by byte lengths, etc, and everything works. In
effect, the type of space returned by malloc() is a union over
/all possible types/ that will fit in the space, including array
types. Treating part of the space one way and another part a
different way is allowed in all cases (again, assuming no
interference from effective type rules or struct/union type
overlap). It is this broad guarantee that gives workaround
code like that shown above defined behavior.

Now for the asterisk. Suppose the byte offset for the "struct
hack" member (of type int[1]) is not a multiple of sizeof(int).
This means an int array starting at the beginning of the malloc()
space doesn't do the job for us. Normally this won't be a
problem, since in addition to

struct { short s; int i1[1]; }

there will be types like

struct { short s; int i10[10]; }
struct { short s; int i100[100]; }
struct { short s; int i1000[1000]; }

with the same offset for the int array members as that of the
first "struct hack" type. However, a perverse implementation
could choose a different offset for the extent 1 case and all
cases with extent greater than 1, which means in principle the
workaround access scheme would give undefined behavior on such
implementations. Not counting such far-fetched scenarios however
the behavior is defined.

(It occurs to me now I should mention one other thing, namely,
storing into a member of a struct gives the freedom to put
indeterminate values into bytes of the struct that don't
correspond to other members. So if there is padding after the
"struct hack" member, and a struct member is stored into after
the SH member array has been set (in those locations), then that
is also a potential conflict. Probably not likely to make a
difference, but I try to be thorough.)

j***@gmail.com

2017-05-20 09:25:32 UTC

Post by Ben Bacarisse
<snip>

I wasn't sure where to jump into this thread so here seems
as good a place as any.

In my view there is no question that the behavior involving 'cp'
is defined, actually for two different reasons. The more subtle
reason I will discuss below. The more obvious reason is that for
any object, including the implied array object pointed to by 'p',
it's always allowed to convert a pointer to the start of the
object to a pointer-to-character type, and access the object as
an array of bytes. (Some people might quibble about having to
increment pointers one-by-one to access the characters, but I'm
going to ignore that since it doesn't pass the laugh test.)
Surely the key provision in the first paragraph regarding memory
management functions, ie, the one that says the returned pointer
from malloc() and friends may be
used to access such an object or an array of such objects in
the space allocated
is meant to include the ability to access the same space as a
character array. The operations on 'cp' are allowed under that
umbrella.

Here we get to the heart of the matter, and the more subtle
reasoning alluded to above. Again I believe the behavior
is defined (with a very small asterisk covering a DS9000
type implementation, as explained further below).
Suppose first, as will most commonly be true, that the offset of
the int member in question is a multiple of sizeof (int). We are
allowed to take the return value of malloc() and convert it to
an 'int *', and use that to access an array of ints. Casting the
pointer 'p' to (void *) gives that same value back (please no
quibbles about "same" versus "equal"), which lets us use that
value to access an array of ints. So something like this
int *ip = (void *)p;
ip += offsetof(string_header_t, ints) / sizeof (int);
ip[3] = 42;
has to work. (Using a char * initially and using a byte offset
to increment that pointer is not an important difference.)
Let me say this again more directly, to convey the impact of what
void *v = malloc( 10000 );
int *ip = v;
long *lp = v;
float *fp = v;
string_header_t *shp = v;
after which /all/ of these pointers may be used to access arrays
of their respective types. As long as we don't run afoul of
effective type rules, pointers to space returned by malloc() may
be freely mixed and matched, intermingled, converted to char *
and adjusted by byte lengths, etc, and everything works. In
effect, the type of space returned by malloc() is a union over
/all possible types/ that will fit in the space, including array
types. Treating part of the space one way and another part a
different way is allowed in all cases (again, assuming no
interference from effective type rules or struct/union type
overlap). It is this broad guarantee that gives workaround
code like that shown above defined behavior.
Now for the asterisk. Suppose the byte offset for the "struct
hack" member (of type int[1]) is not a multiple of sizeof(int).
This means an int array starting at the beginning of the malloc()
space doesn't do the job for us. Normally this won't be a
problem, since in addition to
struct { short s; int i1[1]; }
there will be types like
struct { short s; int i10[10]; }
struct { short s; int i100[100]; }
struct { short s; int i1000[1000]; }
with the same offset for the int array members as that of the
first "struct hack" type. However, a perverse implementation
could choose a different offset for the extent 1 case and all
cases with extent greater than 1, which means in principle the
workaround access scheme would give undefined behavior on such
implementations. Not counting such far-fetched scenarios however
the behavior is defined.
(It occurs to me now I should mention one other thing, namely,
storing into a member of a struct gives the freedom to put
indeterminate values into bytes of the struct that don't
correspond to other members. So if there is padding after the
"struct hack" member, and a struct member is stored into after
the SH member array has been set (in those locations), then that
is also a potential conflict. Probably not likely to make a
difference, but I try to be thorough.)

Ick. Yes.

I was forgetting the struct member ordering and padding problems.

I suppose that would be the reason for wanting to use the empty size
as a flag to say that the array member should come at the end, laid out
to allow arbitrary lengths, where, with arrays of specified sizes, the
compiler "ought to be" free to optimize the structure, and thus re-order
the elements and add padding, etc.

Pragmas exist for some compilers, but I wonder if anyone has ever
suggested a struct type modifier that would require strict layout rules,
under the same sort of reasoning that gave us the volatile modifier.

Seventeen years of separation puts cruft in the brain.

Apologies to all who corrected me.

--
Joel Rees

Delusions of being a novelist:
http://reiisi.blogspot.com/p/novels-i-am-writing.html

j***@gmail.com

2017-05-20 05:09:36 UTC

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;

You don't believe in reading before you spout, I take it.

I've done such things, too.

And I have to apologize for getting peeved the other day.

The conversation has moved in the direction I intended, so I won't bother
adding empty bits here.

Well, except, JFTR.

Except that it is turning out to be more than just for the record.
Having read ahead in the thread, I'll borrow this point to start
on another tack.

Post by j***@gmail.com
Go back, read the source you clipped and tell me that again.

Ok, I've re-read your previous article and I'll tell you that again.
The behavior is undefined.
Here's a demo program, based on your declaration above with the syntax

I assume you mean by eliminating the allocation test I messed up. :)

Post by Keith Thompson
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
typedef struct string_header_s {
short length; /* limited length */
char text[ 1 ];
} string_header_t;
string_header_t *p = malloc(sizeof *p + 10); /* more than we need */
if (p == NULL) exit(EXIT_FAILURE);
p->length = 4;
strcpy(p->text, "abcd");
printf("p->text[3] = '%c'\n", p->text[3]);
}
When I compile and run this on my system, I get no compile-time
p->text[3] = 'd'
But both the strcpy() call and the reference to p->text[3] have
undefined behavior. Concentrating on the latter, the indexing
operator [] is defined in terms of pointer addition, in this case
(p->text + 3). The description of pointer addition, in N1570
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.
The array object in question is p->text, which is of type char[1].
We've used malloc to allocate additional memory past the end of
that array object, but it's not part of the array object.
Most, perhaps all, C compilers will accept this and generate code
that behaves as expected, precisely because the struct hack is
a common idiom. Nevertheless, the behavior is not defined by
the standard.
If you disagree, I'm paying attention.

As Ben notes, it's hard to say that malloc doesn't return an array.

But that was part of the reason I did my own allocation, to make
it clear that what I was working on was, in fact, within one array.

Which leaves us with the question of which array we look at when,
and I'm thinking that ...

------------ internal processing record --------------
well, in my mind, Microsoft spearheaded the aggressive (Can I call
it that?) compile time analysis of C objects. That may not be where
I'm trying to head.

Pascal, as it was originally envisioned, kept a run-time record of
array definitions so that limit checks could be performed at run-time.

If the implementor made that record part of the array object,
physically (so to speak) adjacent to the array, making arrays of
arrays became messy.

C originally kept no such run-time record, which made it much easier
to do multi-dimensioned arrays as just arrays of arrays.

But private allocation is a bigger reason.
------------ end internal processing record --------------

If I had to worry that the compiler would be free to fight with me over
the interpretation of the array in the header type I've shown, I'd have
a really hard time writing allocation records when I'm doing my own
allocation. (I'm including counted strings in this.)

Unless my allocation records, in spite of being adjacent to the
allocated block, included a pointer to the block.

Allocation records adjacent to the allocated blocks can be argued as
being good or bad, but the compiler should not militate against the
technique.

Which leaves us with the empty size declaration as the standard
approved syntactic flag (which I had missed somehow over the last
fifteen years), and the question of whether a size of an array
in such a declaration should be considered by the compiler as a
minimum or maximum, or should be considered at all.

I think the original specification of the language intended to
allow size declarations as indicating minimums as well as maximums.

In other words, the size of the struct would be the minimum
allocatable, and the programmer would know whether he or she had
actually allocated more.

Let's try another header:

typedef struct string_header_s {
short actual_length;
short allocated_max;
char text[ 10 ];
} string_header_t;

And we'll assume code supporting the apparent meaning of the field
names.

Compare that to the form that the standard would urge on us:

typedef struct string_header_s {
short actual_length;
short allocated_max;
char text[]; /* Allocation minimum MUST be 10 */
} string_header_t;

An assertion could help, but, with the empty size declaration, what
could we assert on that would be (effectively) any more meaningful
than the comment?

Post by Keith Thompson
--
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

--
Joel Rees

Delusions of being a novelist:
http://reiisi.blogspot.com/p/novels-i-am-writing.html

s***@casperkitty.com

2017-05-20 06:23:27 UTC

Post by j***@gmail.com
I think the original specification of the language intended to
allow size declarations as indicating minimums as well as maximums.

In the closest thing I know to original specification of the language--the
1974 C Reference Manual written by the inventor of the language found at

https://www.bell-labs.com/usr/dmr/www/cman.pdf

on a machine with 16-bit int, and 8-bit char, the array dimension in the
lines marked with /*****/ serves two identifiable purposes.

struct foo {
int moe;
char larry[10]; /*****/
int curly;
int shemp[6]; /*****/
};

it serves to tell the compiler how far the next item (if any) or the end
of the struct should be from the start of the current item, and it allows
the compiler to compute the value it should report if code calls "sizeof"
on one of those members. If member larry or shemp of a structure is used
in anything other than an lvalue expression, the compiler will take the
address of the structure, add 2 or 14, and process the resulting address
as either a char* or int*; once that is done, the compiler will have no
reason to know or care about the size of the array in question. Note that
in the 1974 CRM, the meaning of structure operations is *defined* in terms
of addresses, thus unambiguously defining a number of useful constructs
which some of today's compiler writers claim were never defined.

James R. Kuyper

2017-05-19 17:33:08 UTC

...

Post by j***@gmail.com
typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;

You don't believe in reading before you spout, I take it.
Go back, read the source you clipped and tell me that again.
If you prove that you have read and understood the source that you
say is undefined, I'll listen to you.
typedef struct string_header_s
{ short length;
char string[ 1 ];
} string_header_t;

...

Post by j***@gmail.com
string_header_t * string_save( char string[] )
{ long length = strlen( string );
string_header_t * headerp = string_allocate( length );
memcpy( headerp->string, string, length );
headerp->string[ length ] = '\0';
return headerp;
}

...

Post by j***@gmail.com
thing = string_save( "hello" );

That's all that's needed to prove that the behavior is undefined - the
details of string_allocate() are irrelevant, so long as it returns a
correctly aligned pointer to sufficient memory - if it failed to do so,
that would in itself be justification for the behavior to be undefined.

When string_save("hello") is called, it sets length = strlen("hello"),
which is 5. Later on, it evaluates headerp->string[length], which is
equivalent to *(headerp->string + length).

Here's what the standard says about that addition:

"When an expression that has integer type is added to or subtracted from
a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression. In
other words, if the expression P points to the i-th element of an array
object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N
has the value n) point to, respectively, the i+n-th and i−n-th elements
of the array object, provided they exist. Moreover, if the expression P
points to the last element of an array object, the expression (P)+1
points one past the last element of the array object, and if the as I
said before, expression Q points one past the last element of an array
object,
the expression (Q)-1 points to the last element of the array object. If
both the pointer operand and the result point to elements of the same
array object, or one past the last element of the array object, the
evaluation shall not produce an overflow; otherwise, the behavior is
undefined." (6.5.6p8)

The relevant array is headerp->string, and it has a length of exactly 1.
headerp->string+5 does not point at any element of that array, nor does
it point one past the end of that array. Therefore, "the behavior is
undefined".

This is a relatively safe example of undefined behavior. Most, possibly
all, C90 compilers dealt with such code exactly as you incorrectly think
they were required to deal with it - that's one of the possibilities
that's allowed when the behavior is undefined. That doesn't change the
fact that the behavior is, in fact, undefined.

If you don't believe my argument is valid, check the official committee
responses to DR 051 and DR 072:
<http://www.open-std.org/jtc1/sc22/wg14/docs/rr/dr_051.html>
<http://www.open-std.org/jtc1/sc22/wg14/docs/rr/dr_072.html>.

However, in C99, flexible array members were invented, which use syntax
that is only slightly different than that of the struct hack, to give
defined behavior to code that is otherwise quite similar to what you
wrote. The key clause that makes the difference says: "However, when a .
(or ->) operator has a left operand that is (a pointer to) a structure
with a flexible array member and the right operand names that member, it
behaves as if that member were replaced with the longest array (with the
same element type) that would not make the structure larger than the
object being accessed;" (6.7.2.1p18)

Since you allocated a block of memory large enough to hold "hello", if
you had used the flexible array member syntax, that block would have
been "the object" to which 6.7.2.1p18 refers, and "headerp->string"
would therefore have been treated as if it were declared string[6], in
which case headerp->string+length would have had defined behavior.

Because the standard has given it's official blessing to the use of
flexible array members, I think there's a small but real chance that
some compilers will no longer support the struct hack. The simplest way
to do that would be to implement run-time array-bounds checking.
However, run-time array-bounds checking is very expensive. A more
plausible way for the struct hack to fail would be optimizations based
upon the unchecked assumption that headerp->string will never have any
integer added to it with a value other than 0 or 1, and that if the
amount added is 1, the resulting pointer will never be dereferenced.
Arbitrarily obscure failure modes could occur as a result of such
optimizations if that assumption is violated.

Still, the struct hack was very widely used before flexible array
members were invented, and I wouldn't recommend holding your breath
waiting for a compiler to come out that breaks it.

s***@casperkitty.com

2017-05-19 20:44:26 UTC

Post by James R. Kuyper
If you don't believe my argument is valid, check the official committee
<http://www.open-std.org/jtc1/sc22/wg14/docs/rr/dr_051.html>
<http://www.open-std.org/jtc1/sc22/wg14/docs/rr/dr_072.html>.

How are defect reports intended to be interpreted? What kind of wording
in a defect report would distinguish among the following scenarios:

1. The authors of the Standard almost certainly intended to say that X
was defined, but the wording of the Standard is defective and says
that is is not. The next version of the Standard should fix the wording
so that X is unambiguously defined; until then, compilers should treat
X as defined.

2. The authors of the Standard intended to say that X is undefined, but
the wording of the Standard is defective and could be misconstrued as
saying that X is defined, but it is unlikely that any programmers would
be relying upon that. The next version of the Standard should improve
the wording to make clear that X is not defined, and programmers should
expect that compilers may interpret X as undefined unless they
explicitly document otherwise.

3. The authors of the Standard intended to say that X is undefined, but
the wording of the Standard is defective and could be misconstrued as
saying that X is defined, and programmers might reasonably be relying
upon that. The next version of the Standard should define a standard
macro to say whether a particular implementation treats X as defined;
until then, compilers should either treat X as defined or explicitly
document that they do not.

4. It is unclear whether the authors of the Standard intended X to be
defined or undefined; the wording of the Standard would seem to
suggest that it is not defined, but could also be construed as saying
that it is defined, and programmers might reasonably be relying upon
that. The Standard is clearly defective in some fashion, but without
knowing the intended meaning it's impossible to know in which way it's
defective or what should be done to "fix" it.

A finding that X is "undefined" would be consistent with any of the above,
but the different scenarios should be handled differently.

j***@verizon.net

2017-05-20 02:20:39 UTC

How are defect reports intended to be interpreted? What kind of wording
1. The authors of the Standard almost certainly intended to say that X
was defined, but the wording of the Standard is defective and says
that is is not. The next version of the Standard should fix the wording
so that X is unambiguously defined; until then, compilers should treat
X as defined.
2. The authors of the Standard intended to say that X is undefined, but
the wording of the Standard is defective and could be misconstrued as
saying that X is defined, but it is unlikely that any programmers would
be relying upon that. The next version of the Standard should improve
the wording to make clear that X is not defined, and programmers should
expect that compilers may interpret X as undefined unless they
explicitly document otherwise.
3. The authors of the Standard intended to say that X is undefined, but
the wording of the Standard is defective and could be misconstrued as
saying that X is defined, and programmers might reasonably be relying
upon that. The next version of the Standard should define a standard
macro to say whether a particular implementation treats X as defined;
until then, compilers should either treat X as defined or explicitly
document that they do not.
4. It is unclear whether the authors of the Standard intended X to be
defined or undefined; the wording of the Standard would seem to
suggest that it is not defined, but could also be construed as saying
that it is defined, and programmers might reasonably be relying upon
that. The Standard is clearly defective in some fashion, but without
knowing the intended meaning it's impossible to know in which way it's
defective or what should be done to "fix" it.
A finding that X is "undefined" would be consistent with any of the above,
but the different scenarios should be handled differently.

No, a finding that "X is undefined" is not consistent with any of the above.
Unless modified by the addition of words saying something else, "X is undefined"
means nothing more or less than:

"The committee intended the behavior to be undefined, and in the opinion of the
committee, the standard expresses that intent clearly enough, andany one who's
been interpreting it in a contrary manner should adjust their interpretations
accordingly."

As far as I know (and I haven't done an exhaustive search, so I could be
mistaken), the committee tends not to resolve defect reports by telling people
that they should act as though the standard says something other than what it
actually says. That would constitute an official change to the standard, and
making such changes is what Technical Corrigenda (TC) are for. The resolution of
a defect report may recommend issuing a TC, but the resolution itself does not
change the meaning of the standard - that's achieved only when the TC has been
approved.

All of the other nuances you mention above can appear in a defect report, and I
believe that most of them have appeared in at least one such report. But they
are never implied by the statement "X is undefined". Such things as "the wording
does not correctly express the intent of the committee" or "the wording could be
improved to make the intent clearer" are stated explicitly, when the committee
believes them to be applicable.

s***@casperkitty.com

2017-05-20 06:30:21 UTC

Post by j***@verizon.net

Post by s***@casperkitty.com
How are defect reports intended to be interpreted? What kind of wording
1. The authors of the Standard almost certainly intended to say that X
was defined, but the wording of the Standard is defective and says
that is is not. The next version of the Standard should fix the wording
so that X is unambiguously defined; until then, compilers should treat
X as defined.
2. The authors of the Standard intended to say that X is undefined, but
the wording of the Standard is defective and could be misconstrued as
saying that X is defined, but it is unlikely that any programmers would
be relying upon that. The next version of the Standard should improve
the wording to make clear that X is not defined, and programmers should
expect that compilers may interpret X as undefined unless they
explicitly document otherwise.
3. The authors of the Standard intended to say that X is undefined, but
the wording of the Standard is defective and could be misconstrued as
saying that X is defined, and programmers might reasonably be relying
upon that. The next version of the Standard should define a standard
macro to say whether a particular implementation treats X as defined;
until then, compilers should either treat X as defined or explicitly
document that they do not.
4. It is unclear whether the authors of the Standard intended X to be
defined or undefined; the wording of the Standard would seem to
suggest that it is not defined, but could also be construed as saying
that it is defined, and programmers might reasonably be relying upon
that. The Standard is clearly defective in some fashion, but without
knowing the intended meaning it's impossible to know in which way it's
defective or what should be done to "fix" it.
A finding that X is "undefined" would be consistent with any of the above,
but the different scenarios should be handled differently.

No, a finding that "X is undefined" is not consistent with any of the above.
Unless modified by the addition of words saying something else, "X is undefined"
"The committee intended the behavior to be undefined, and in the opinion of the
committee, the standard expresses that intent clearly enough, andany one who's
been interpreting it in a contrary manner should adjust their interpretations
accordingly."

My question was "what kind of wording would distinguish among the various
possibilities". If the Standard had been adequately clear, I would think
there would be no need for a defect report. If any significant number of
people reading the Standard aren't certain about what it means, that would
constitute prima facie evidence that at minimum the Standard is less clear
than it should be.

j***@verizon.net

2017-05-20 14:20:51 UTC

...

Post by j***@verizon.net
No, a finding that "X is undefined" is not consistent with any of the above.
Unless modified by the addition of words saying something else, "X is undefined"
"The committee intended the behavior to be undefined, and in the opinion of the
committee, the standard expresses that intent clearly enough, andany one who's
been interpreting it in a contrary manner should adjust their interpretations
accordingly."

There's no description so clear that no one can misunderstand it. If the
committee thinks that the description is either wrong, or even merely unclear,
the committee's resolution to a DR will express that opinion explicitly. In the
absence of any statement to that effect, the DR resolution is intended only to
instruct people in the correct interpretation of the standard, even if it's not
the one that would have occurred to them without instruction.

j***@verizon.net

2017-05-20 14:46:42 UTC

Post by j***@verizon.net

My question was "what kind of wording would distinguish among the various
possibilities".

It just occurred to me that I've only answered that question in an oblique way.
I should answer it directly. First of all, your list of the various
possibilities did not include the one possibility that actually applied.

Secondly, the kind of wording that would distinguish those possibilities is
wording that explicitly states the things you were hoping were implied. The
possibility that the committee had decided that the wording did not correctly
convey the committee's intent would be conveyed by explicit words to that
effect. The possibility that the committee had concluded that the person filing
the DR was justified in being confused, because the wording was confusing, would
be conveyed by an explicit statement that the wording was confusing, and needed
to be improved. A recommendation that a future version of the standard should be
modified to address these flaws is implicit in either of those cases - but it is
most definitely NOT implicit in a statement that "X is undefined".

One of the possibilities that you mentioned is " The next version of the
Standard should define a standard macro to say whether a particular
implementation treats X as defined; until then, compilers should either treat X
as defined or explicitly document that they do not." That is not something that
the committee would ever say, so you don't need to wonder about what wording
they'd use to convey that meaning. First of all, if the standard's wording does
need improving, the improvement will consist of the standard either defining X,
or saying that X is undefined, or leaving X implicitly undefined by failing to
define it. They would not leave the question of whether or not it is defined to
be determined by checking a macro. The committee has created optional features
in C, but never have they done so in response to a defect report questioning
whether or not something is defined.
Secondly, committee resolutions never take the form of advice for implementors
about how to deal with the issue raised by the DR - they either express what the
committee believes is the correct interpretation of the words as they are
currently written (in which cases implementors who wish to conform to the
standard should modify their implementation, if necessary, to match that
interpretation), or they recommend that the words be changed (in which case
implementors have the option of anticipating the correction by implementing it
as an extension now, or waiting until the correction has been approved before
implementing it as a conformance requirement).

You also mentioned "The Standard is clearly defective in some fashion, but
without knowing the intended meaning it's impossible to know in which way it's
defective or what should be done to "fix" it." as a possibility. It's not
appropriate to mark a DR as resolved if there's still that much uncertainty
about the issue. The committee should know what it intended - if it doesn't,
then resolving the DR would be premature.

s***@casperkitty.com

2017-05-22 17:50:28 UTC

Post by j***@verizon.net
I should answer it directly. First of all, your list of the various
possibilities did not include the one possibility that actually applied.

I'd say #2 would seem to match the case you describe; if the Standard
were adequately clear on a subject, it should be possible to answer
any questions about its meaning merely by citing and quoting the relevant
portions; if a question that would need to be answered, cannot be answered
merely by quoting the Standard, I would regard that as indicative of a
defect which should be corrected by, at minimum, adding a footnote to
future standards that would clarify the issue.

Also, who writes the answers to DRs? Are they voted upon by the Committee as
a whole? There are certain places where the Standard appears to deliberately
avoid clarifying examples that are so obviously essential that the only
plausible explanations are either that the Committee was being absurdly dumb
or else no consensus could be reached as to whether they should be defined
or not. For example, under 6.5.2.3, there should be an example where structs
containing a Common Initial Sequence are accessed via pointers of their
respective types, at places where a complete union definition containing
both structures is visible. I personally think the language is quite clear
as saying such behavior is defined (saying CIS members may be inspected
"anywhere that a declaration of the complete type of the union is visible"
would be redundant if such inspection had to be done through lvalues of
union type, since such accesses can't possibly be made anywhere else) but
compiler writers don't follow that.

Post by j***@verizon.net
Secondly, committee resolutions never take the form of advice for implementors
about how to deal with the issue raised by the DR - they either express what the
committee believes is the correct interpretation of the words as they are
currently written (in which cases implementors who wish to conform to the
standard should modify their implementation, if necessary, to match that
interpretation), or they recommend that the words be changed (in which case
implementors have the option of anticipating the correction by implementing it
as an extension now, or waiting until the correction has been approved before
implementing it as a conformance requirement).

There are a number places where it would make sense for 99.9% of
implementations to behave the same way, and to allow programmers to
exploit that if they don't need to support the unusual 0.1%. The
Standard leaves such behaviors Undefined so as to avoid forbidding
implementations from doing some other sensible behavior, sometimes
even when an earlier standard defined the behavior (e.g. -1<<1).

What's really needed, but what the Standard has avoided for some (likely
political) reasons, would be an effort to recognize behaviors that are
commonplace but not universal. Ironically, I suspect it's politically
easier to recognize a behavior that's supported by 50% of implementations
than one that's supported by 99%, since the latter would be seen as
marginalizing the 1%.

James R. Kuyper

2017-05-18 16:37:33 UTC

Post by Tim Rentsch
[...]

Post by j***@gmail.com
An indefinite array type has more limitations than a type of an
array with one element.

In some ways it has more, in other ways it has fewer.

Post by j***@gmail.com
Both, of course, can refer to an array of more than one element when instantiated.

What do you mean? AFAIK there is no way to "instantiate" an
indefinite array type.

Bad phrasing, I guess. "... when in use." probably would have been
more accurate.

Ah. In that case I must demur regarding part of your later
statement. An access through an array type of extent 1 is
not definedly allowed to access elements after the element
at index 0.

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;
And before you C++ types came around, we just left the array size
out: char text[].

I don't know what connection you think this has to C++, but I suspect
you're mistaken. Unlike C, C++ has never given well-defined behavior to
any version of this approach.

Using [1] is the original "struct hack", and as Tim pointed out, any
attempt to access text[i] for i>0 had undefined behavior. It probably
works on just about any C90 compiler, but that's merely a historical
fact, it's not something the C standard ever guaranteed.

Using [0] for the length is a constraint violation, but allowing it to
be used for this purpose was a common extension to C. The
implementations that provided this extension defined it's behavior, but
the C standard itself never did so.

Using [] for this purpose is a feature that was added in C99; an array
declared in that fashion as the last member of a struct is called a
flexible array member. And it's only the use of flexible array members
for this purpose that has defined behavior.

Post by j***@gmail.com
... But that was declared in all cases to be equivalent
to char *, so it means something else, now. ...

Who declared that? No version of the C standard does so.

Ben Bacarisse

2017-05-18 19:02:14 UTC

Post by Tim Rentsch
[...]

Post by j***@gmail.com
An indefinite array type has more limitations than a type of an
array with one element.

In some ways it has more, in other ways it has fewer.

Post by j***@gmail.com
Both, of course, can refer to an array of more than one element
when instantiated.

What do you mean? AFAIK there is no way to "instantiate" an
indefinite array type.

Bad phrasing, I guess. "... when in use." probably would have been
more accurate.

Ah. In that case I must demur regarding part of your later
statement. An access through an array type of extent 1 is
not definedly allowed to access elements after the element
at index 0.

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;
And before you C++ types came around, we just left the array size
out: char text[].

No, the old idiom as an array length 1 as above. The form sanctioned by
the newer C standards (C99 and up) is to leave the size out, provided
this is the last member of a struct. (I don't know what C++ has to do
with it -- I took "you C++ types" to refer to people, but I don't know
who.)

Post by j***@gmail.com
But that was declared in all cases to be equivalent
to char *, so it means something else, now. And if that's what you
mean in such a header, you say char *, instead.

No. char x[] and char *x mean the same thing in a parameter declaration
but not elsewhere, and definitely not in a struct!

<snip>

Post by j***@gmail.com
typedef struct string_header_s
{ short length;
char string[ 1 ];
} string_header_t;
char bigblock[ 100000 ];
char * here = bigblock;
string_header_t * string_allocate( long length )
{ char * place = here;;
if ( ( place = here + length + sizeof (string_header_t) ) >= bigblock + 100000 )

This test is, technically, wrong. Most people would not worry about it,
but once you know what C guarantees and what it does not guarantee it's
hard to go back[1]. I'd just keep a counter of the bytes used to avoid
a potentially undefined pointer construction and test.

Post by j***@gmail.com
{ return NULL;
}
here = place;
( (string_header_t *) place )->length = length;
return (string_header_t *) place;
}

You also have potential alignment problems.

<snip>

[1] You can only generate pointers that point into, or just past, an
array, and pointer can only be compared with <, <=, > and >= if they
point into the same object (or just past).

--
Ben.

j***@gmail.com

2017-05-19 07:20:31 UTC

Post by Tim Rentsch
[...]

Post by j***@gmail.com
An indefinite array type has more limitations than a type of an
array with one element.

In some ways it has more, in other ways it has fewer.

Post by j***@gmail.com
Both, of course, can refer to an array of more than one element
when instantiated.

What do you mean? AFAIK there is no way to "instantiate" an
indefinite array type.

Bad phrasing, I guess. "... when in use." probably would have been
more accurate.

Ah. In that case I must demur regarding part of your later
statement. An access through an array type of extent 1 is
not definedly allowed to access elements after the element
at index 0.

typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;
And before you C++ types came around, we just left the array size
out: char text[].

Post by j***@gmail.com
But that was declared in all cases to be equivalent
to char *, so it means something else, now. And if that's what you
mean in such a header, you say char *, instead.

No. char x[] and char *x mean the same thing in a parameter declaration
but not elsewhere, and definitely not in a struct!

Yeah, parameters. You could increment a parameter declared that way as
if it were a pointer, but you couldn't assign a struct member delared
empty. I keep forgetting.

I have this vague memory that the reasoning behind the array size 1 hack
was to avoid conflation with the tendency to assume a size left out was
a pointer.

Post by Ben Bacarisse
<snip>

This test is, technically, wrong.

Erk. Forgetting that pointers in non-linear address spaces can wrap weird.

Okay, ignoring the potential alignment problems you point out below,

if ( ( bigblock + 100000 - here - length - sizeof (string_header_t) ) <= 0 )

and that won't help me if here somehow ends up pointing outside the array.

Post by Ben Bacarisse
Most people would not worry about it,
but once you know what C guarantees and what it does not guarantee it's
hard to go back[1].

I'm proof that ain't true, I guess. :-/

I've sometimes been of a mind that address involving truncated segment
registers should have been treated as something other than pointers.

Post by Ben Bacarisse
I'd just keep a counter of the bytes used to avoid
a potentially undefined pointer construction and test.

I have this love-hate relationship with counters. I'm not sure why.

Might be the one-offs.

Post by j***@gmail.com
{ return NULL;
}
here = place;
( (string_header_t *) place )->length = length;
return (string_header_t *) place;
}

You also have potential alignment problems.

<8-o

Post by Ben Bacarisse
<snip>
[1] You can only generate pointers that point into, or just past, an
array, and pointer can only be compared with <, <=, > and >= if they
point into the same object (or just past).
--
Ben.

Thanks for looking at the code.

And I guess I should acknowledge that Keith and others were trying to
point out that the standard had flip-flopped on which hack to use for
indeterminant, that is, flexible, arrays while I was busy working with
other languages in the previous decade.

--
Joel Rees

Randing rantomly:
http://reiisi.blogspot.com

Keith Thompson

2017-05-19 16:51:54 UTC

[...]

Post by Ben Bacarisse
No. char x[] and char *x mean the same thing in a parameter declaration
but not elsewhere, and definitely not in a struct!

Yeah, parameters. You could increment a parameter declared that way as
if it were a pointer, but you couldn't assign a struct member delared
empty. I keep forgetting.

A parameter declared that way *is* a pointer. For example, this:
void func(int param[42]);
really means this:
void func(int *param);
`param` simply is not an array. (And the 42 is quietly ignored,
which is unfortunate.)

Post by j***@gmail.com
I have this vague memory that the reasoning behind the array size 1 hack
was to avoid conflation with the tendency to assume a size left out was
a pointer.

There was a time, very early in the design of C, when [] was used
to declare pointers, and defining an array object would create an
implicit pointer object pointing to the array's initial element.
Dennis Ritchie changed that because it didn't work when structures
have members that are arrays. For example, if you have a structure
like this:

struct foo {
int arr[10];
};

and the definition of arr creates a pointer, then copying the structure
would copy the pointer, and the copy would point back to the original
object.

This change was made some time around the transition from "NB"
to C. See <https://www.bell-labs.com/usr/dmr/www/>, and search for
"Embryonic C".

Unfortunately, the myth that C arrays are really just pointers has
survived long past the time when there was any truth to it.

Suggested reading: Section 6 of the comp.lang.c FAQ,
<http://www.c-faq.com/>. (And then the rest of it.)

[...]

Post by j***@gmail.com
And I guess I should acknowledge that Keith and others were trying to
point out that the standard had flip-flopped on which hack to use for
indeterminant, that is, flexible, arrays while I was busy working with
other languages in the previous decade.

I wouldn't say the standard flip-flopped. The C89/C90 standard doesn't
mention the struct hack. C programmers invented it, and it happened to
work. C99 introduced flexible array members as a standard-endorsed way
to achieve the same effect while avoiding undefined behavior.

s***@casperkitty.com

2017-05-19 17:31:56 UTC

The struct hack was in use long before C89 was published, and if there
are two ways of reading the Standard--one of which irreparable break such
code and semantically devastate the language, and one of which would be
compatible with it, common sense would suggest that the authors of the
Standard would have the latter (unless they were lying about wanting to
avoid breaking existing code).

It would have been entirely reasonable for C99 to officially deprecate the
struct hack, since it offered a practical and mostly-superior alternative,
but it is not reasonable to describe such a feature as offering major new
semantics beyond what had been added 25 years before when C acquired the
ability to nest array types in structures. The only way FAM could be seen
as offering major new semantics would be by interpreting C89 as a major step
back relative to the various forms of the language that had been produced
in the preceding 15 years.

Ike Naar

2017-05-19 19:48:20 UTC

Post by j***@gmail.com
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef struct string_header_s
{ short length;
char string[ 1 ];
} string_header_t;
char bigblock[ 100000 ];
char * here = bigblock;
string_header_t * string_allocate( long length )
{ char * place = here;;
if ( ( place = here + length + sizeof (string_header_t) ) >= bigblock + 100000 )

It's undefined behaviour to produce a pointer beyond one-past-the-end of an array.
So 'here + length + sizeof(string_header_t)' has undefined behaviour if
(here - bigblock) + length + sizeof (string_header_t) > sizeof bigblock' .

Tim Rentsch

2017-05-20 15:47:13 UTC

[...pre-C99 flexible array members...]
Ah. In that case I must demur regarding part of your later
statement. An access through an array type of extent 1 is
not definedly allowed to access elements after the element
at index 0.

I see you have walked back some comments about the behavior of
indexing, so I won't revisit that, but I wanted to respond on
some other code items. (Some code below reformatted for line
length.)

Post by j***@gmail.com
typedef string_header_s struct
{ short length; /* limited length */
char text[ 1 ];
} string_header_t;
char bigblock[ 100000 ];
char * here = bigblock;
string_header_t * string_allocate( long length )
{ char * place = here;;
if (
( place = here + length + sizeof (string_header_t) )

= bigblock + 100000 )

{ return NULL;
}
here = place;
( (string_header_t *) place )->length = length;
return (string_header_t *) place;
}

I thought it might be helpful to show how this function could be
written to avoid some of the problems it has (which I think you
mostly know about already). Please excuse changes in using white
space. (Disclaimer: not compiled.)

string_header_t *
string_allocate( long length ){
string_header_t *it = (void*) here;
long it_size = sizeof *it;
long needed = 1 + (it_size + length - 1) /it_size *it_size;
char *block_limit = bigblock + sizeof bigblock;

if( block_limit - here < needed ) return NULL;

here += needed;
return it->length = length, it;
}

The initializer for 'needed' shows how to make sure the amount
allocated is a suitable multiple of the struct size, ie, by
rounding up.

The if() test shows how to check if enough space is available
without running into problems of "pointer overflow".

And also for this routine, because it needs to protect against
allocation failure (also not compiled):

string_header_t *
string_save( char string[] ){
long n = strlen( string );
string_header_t *it = string_allocate( n );
return it ? memcpy( it->string, string, n+1 ), it : NULL;
}

Patrick.Schluter

2017-05-27 16:50:18 UTC

Post by Keith Thompson
I suspect that this is used to achieve a kind of semi-fake
pass-by-reference semantics.

An alternative possibility is that the array size in this code was not
always 1, or that the author thinks it might not always be 1 in the future.

The va_arg on Linux x86_64 is implemented that way. An array of 1
element struct containing the different pointers and number of values.

j***@gmail.com

2017-05-12 13:17:26 UTC

Post by fl
Hi,
I see this typedef, which has '[1]' while the below declaration without it.
What use is '[1]'?
Thanks
-----------------
typedef __mpz_struct mpz_t[1];
mpz_t s_divisor;

I was thinking I had some sample source using GMP's arbitrary precision
integers, but it's arbitrary precision floating point.

Just in case it's useful for the usage patterns:

https://ja.osdn.net/users/reiisi/pastebin/4462

And the explanation:

http://joels-programming-fun.blogspot.jp/2016/11/using-gmp-to-approximate-pi.html

(Serious boondoggle code.)

Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

--
Joel Rees

Delusions of being a novelist:
http://reiisi.blogspot.com/p/novels-i-am-writing.html

s***@casperkitty.com

2017-05-12 15:46:48 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

IMHO, the struct hack would have been cleaner in many ways had the authors
of the Standard not opted to disallow arrays of size zero. The only good
reason I can see for disallowing such arrays would have been to ensure the
correct "behavior" of a existing code that uses constructs like

struct dummyName { int x[sizeof foo == 50]; };

as a form of compile-time assertion. I recognize that requiring compilers
to accept arrays of size zero [before the Standard, some did and some did
not] could have silently undermined what could be important compile-time
safety checks, but given that compilers would be allowed to warn about
zero-size arrays even if the Stanard allowed them, and could have offered
a non-conforming mode that would break the build on such declarations, I
would not think that would have been a major issue.

Otherwise, I think it would have been useful to allow zero-size objects
with the privisos that:

(1) the address of a stand-alone zero-size object may or may not match the
address of any other, at the compiler's discretion, and may be null if the
compiler upholds condition #2; (2) adding any integer to the a pointer to a
zero-sized object yields a matching pointer; (3) subtracting two equal
pointers to zero-sized objects will yield an Unspecified integer.

In many cases, all a compiler would have to do to uphold those requirements
would simply be to check that array sizes were non-negative, rather than
checking whether they were greater than zero; except for the existence of
code that uses the aformentioned pattern for "static assertions" I don't
see any real disadvantage to such allowance.

j***@gmail.com

2017-05-13 01:11:30 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

#error was not good enough?

Fortunately, I think I have never seen code like that.

No, wait, I probably have, now that I think about it. But I'm going to
admit to bias here and assert my opinion that code like that doesn't
really deserve to be supported by the standard.

Unless the standard provided the support through an optional switch and
the switch were off by default.

(I understood the reasons they didn't want those switches in the first
version of the standard, but we need them now, for all sorts of things.)

Post by s***@casperkitty.com
as a form of compile-time assertion. I recognize that requiring compilers
to accept arrays of size zero [before the Standard, some did and some did
not] could have silently undermined what could be important compile-time
safety checks, but given that compilers would be allowed to warn about
zero-size arrays even if the Stanard allowed them, and could have offered
a non-conforming mode that would break the build on such declarations, I
would not think that would have been a major issue.
Otherwise, I think it would have been useful to allow zero-size objects
(1) the address of a stand-alone zero-size object may or may not match the
address of any other, at the compiler's discretion, and may be null if the
compiler upholds condition #2; (2) adding any integer to the a pointer to a
zero-sized object yields a matching pointer; (3) subtracting two equal
pointers to zero-sized objects will yield an Unspecified integer.
In many cases, all a compiler would have to do to uphold those requirements
would simply be to check that array sizes were non-negative, rather than
checking whether they were greater than zero; except for the existence of
code that uses the aformentioned pattern for "static assertions" I don't
see any real disadvantage to such allowance.

I think I'm somewhat in agreement there, but I don't have to check how much.

--
Joel Rees

Randomly ranting:
http://reiisi.blogspot.com

Keith Thompson

2017-05-13 01:37:11 UTC

[...]

Post by s***@casperkitty.com
IMHO, the struct hack would have been cleaner in many ways had the
authors of the Standard not opted to disallow arrays of size zero.
The only good reason I can see for disallowing such arrays would have
been to ensure the correct "behavior" of a existing code that uses
constructs like
struct dummyName { int x[sizeof foo == 50]; };

#error was not good enough?

#error didn't exist until C99, if I recall correctly.

[...]

Post by j***@gmail.com
Unless the standard provided the support through an optional switch and
the switch were off by default.

The standard doesn't mention switches. There are some optional
features, but they're provided at the option of the implementation, not
of the user.

[...]

Tim Rentsch

2017-05-13 16:44:31 UTC

On Saturday, May 13, 2017 at 12:46:56 AM UTC+9,

[...]

#error was not good enough?

#error didn't exist until C99, if I recall correctly.

C89/C90 has #error. What might be confusing you is that in
C99 #error requires compilation to fail, whereas in C89/C90
all that is required is a diagnostic. In practical terms
I expect there is no difference, but the requirements are
slightly different in the two cases. In both cases though
a diagnostic is required.

Keith Thompson

2017-05-13 20:47:32 UTC

[...]

Post by j***@gmail.com
#error was not good enough?

#error didn't exist until C99, if I recall correctly.

C89/C90 has #error.

Quite right. I should have checked before posting. (I wish I had a
PDF copy of the C90 standard that wasn't scanned from a paper copy.
"ansi.c.txt", a draft of the ANSI C89 standard, is useful for quick
checks like this.)

Post by Tim Rentsch
What might be confusing you is that in
C99 #error requires compilation to fail, whereas in C89/C90
all that is required is a diagnostic.

No, I was simply wrong about when #error was introduced. I wasn't
aware of that difference until you mentioned it.

Post by Tim Rentsch
In practical terms
I expect there is no difference, but the requirements are
slightly different in the two cases. In both cases though
a diagnostic is required.

It's interesting that the C90 version of #error seems similar to the
#warning directive, a common non-standard extension (and one that I
wouldn't mind being standardized). But as far as I know, in most C90
compilers #error causes compilation to fail, something that's not
implied by the description in the standard.

Robert Wessel

2017-05-14 05:28:37 UTC

Post by j***@gmail.com
#error was not good enough?

#error didn't exist until C99, if I recall correctly.

C89/C90 has #error.

Post by Tim Rentsch
What might be confusing you is that in
C99 #error requires compilation to fail, whereas in C89/C90
all that is required is a diagnostic.

No, I was simply wrong about when #error was introduced. I wasn't
aware of that difference until you mentioned it.

Post by Tim Rentsch
In practical terms
I expect there is no difference, but the requirements are
slightly different in the two cases. In both cases though
a diagnostic is required.

I don't believe C89 actually requires compilation to fail in *any*
circumstance, just that a diagnostic be issued if certain rules are
violated. C99 specifies one (IIRC) condition in which compilation
must fail (use of #error), but it doesn't actually define what failure
means ("The implementation shall not successfully translate a
preprocessing translation unit containing a #error preprocessing
directive...").

Tim Rentsch

2017-05-14 20:39:03 UTC

Post by j***@gmail.com
#error was not good enough?

#error didn't exist until C99, if I recall correctly.

C89/C90 has #error.

Post by Tim Rentsch
What might be confusing you is that in
C99 #error requires compilation to fail, whereas in C89/C90
all that is required is a diagnostic.

No, I was simply wrong about when #error was introduced. I wasn't
aware of that difference until you mentioned it.

Post by Tim Rentsch
In practical terms
I expect there is no difference, but the requirements are
slightly different in the two cases. In both cases though
a diagnostic is required.

It's interesting that the C90 version of #error seems similar to the
#warning directive, a common non-standard extension (and one that I
wouldn't mind being standardized).

Personally I think the directive being named #error already makes
it different enough from #warning that the behaviors may be
expected to be different, even if the stated semantics for #error
doesn't require them to be so.

Post by Keith Thompson
But as far as I know, in most C90
compilers #error causes compilation to fail, something that's not
implied by the description in the standard.

To me, just naming the directive #error is enough to imply that
the compilation will fail, and I would be surprised by any
implementation that didn't. To be clear, I agree with what you
are saying (or at least what I think you're saying), and don't
mean to say otherwise - my comment is related but is not meant to
be in opposition.

Richard Bos

2017-05-15 13:02:07 UTC

Post by Keith Thompson
But as far as I know, in most C90
compilers #error causes compilation to fail, something that's not
implied by the description in the standard.

To me, just naming the directive #error is enough to imply that
the compilation will fail, and I would be surprised by any
implementation that didn't.

So would I - #4.4 says, literally:

The implementation shall not successfully translate a preprocessing
translation unit containing a #error preprocessing directive unless it
is part of a group skipped by conditional inclusion.

If compilation doesn't fail, surely that is the same thing as saying
that the TU _was_ translated successfully?

Richard

Tim Rentsch

2017-05-15 14:30:40 UTC

Post by Richard Bos

Post by Keith Thompson
But as far as I know, in most C90
compilers #error causes compilation to fail, something that's not
implied by the description in the standard.

To me, just naming the directive #error is enough to imply that
the compilation will fail, and I would be surprised by any
implementation that didn't.

The implementation shall not successfully translate a preprocessing
translation unit containing a #error preprocessing directive unless it
is part of a group skipped by conditional inclusion.
If compilation doesn't fail, surely that is the same thing as saying
that the TU _was_ translated successfully?

The quoted passage appears in C99 but does not appear in C90.
In C90 encountering a #error requires issuing a diagnostic,
but strictly speaking not more than that.

s***@casperkitty.com

2017-05-13 03:45:16 UTC

Post by s***@casperkitty.com
IMHO, the struct hack would have been cleaner in many ways had the authors
of the Standard not opted to disallow arrays of size zero. The only good
reason I can see for disallowing such arrays would have been to ensure the
correct "behavior" of a existing code that uses constructs like
struct dummyName { int x[sizeof foo == 50]; };

#error was not good enough?

Even on compilers which support #error (it was added in C99, but some
compilers supported it before that) it is limited to squawking about things
understood by the preprocessor. There would be no way to have an #error
directive squawk if something is the wrong size because the preprocessor
has no clue about structure sizes, and the main processor won't get to do
anything unless all #error directives are eliminated at the preprocessor
stage.

Using array sizes as a form of static assert is icky, but the Standard does
not really provide much direct alternative. On some compilers, a statement
like

if (!condition) {
extern void something_bad_happened(void);
something_bad_happened();
}

will lead to a linker squawk if no function of the indicated name exists
and the compiler can't prove that condition is definitely true, but on some
implementations such code would generate a linker squawk regardless. MY
preference is to use some variation of the array-dimension trick but use
an expression that always evaluates to a positive value or a negative value
and doesn't rely upon any particular behavior with an array size of zero.
While I don't know to what extent the existence of code that uses array
sizes as assertions was a factor in the decision to disallow zero-sized
objects, I can't think of any other advantage to disallowing them.

Tim Rentsch

2017-05-13 16:57:44 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

How the Standard supports that declaration now is to say that it
will be accepted if 'sizeof foo' is a compile-time constant and
equal to 50, but otherwise requires a diagnostic. In what way
do you think that specification should be changed? IMCO it is
fine just as it is.

Philip Lantz

2017-05-13 19:07:32 UTC

Post by s***@casperkitty.com
IMHO, the struct hack would have been cleaner in many ways had the authors
of the Standard not opted to disallow arrays of size zero. The only good
reason I can see for disallowing such arrays would have been to ensure the
correct "behavior" of a existing code that uses constructs like
struct dummyName { int x[sizeof foo == 50]; };

Prior to the addition of _Static_assert, this was a way to obtain a
similar capability. #error cannot be used to do the same thing. (Clearly
_Static_assert is a huge improvement, which is why it was added.)

There is probably still lots of code around that uses this method, and it
would be inappropriate for the standard to change in a way that would make
that code no longer work as intended.

Philip

David Brown

2017-05-14 13:34:30 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

Code like that is perfectly reasonable - /if/ it is hidden within an
appropriate macro:

#define STATIC_ASSERT_NAME_(line) STATIC_ASSERT_NAME2_(line)
#define STATIC_ASSERT_NAME2_(line) assertion_failed_at_line_##line
#define STATIC_ASSERT(claim, warning) \
typedef struct { \
char STATIC_ASSERT_NAME_(__COUNTER__) [(claim) ? 1 : -1]; \
} STATIC_ASSERT_NAME_(__COUNTER__)

(__COUNTER__ is available on some compilers - replace it with __LINE__
for standard compliance.)

With C11, you have _Static_assert instead - but this macro works with
previous C standards to give you a compile-time error when the static
assertion fails.

Tim Rentsch

2017-05-14 19:48:59 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

This sounds like hyperbole. If it isn't then you aren't thinking
hard enough.

s***@casperkitty.com

2017-05-15 14:39:19 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

This sounds like hyperbole. If it isn't then you aren't thinking
hard enough.

I said the only good reason I could see. Since I could see no other good
reasons, the statement was accurate. You presumably see other ways in which
disallowing zero-size arrays improves the language. Care to enlighten me?

Tim Rentsch

2017-05-24 15:23:02 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

This sounds like hyperbole. If it isn't then you aren't thinking
hard enough.

I said the only good reason I could see. Since I could see no
other good reasons, the statement was accurate.

Yes, and I said that sounds like hyperbole, which is also an
accurate statement.

Post by s***@casperkitty.com
You presumably see other ways in which disallowing zero-size
arrays improves the language. Care to enlighten me?

Based on past experience I have to think that what you're looking
for is not an answer but an argument. I have no interest in
getting into an argument. But if you want to stop by I might be
willing to try hitting you over the head with a stick until
you become enlightened. :)

s***@casperkitty.com

2017-05-24 16:17:36 UTC

Post by j***@gmail.com
Anyway, an array of one is most commonly a signal that you are
allocating and using variable length arrays.

This sounds like hyperbole. If it isn't then you aren't thinking
hard enough.

I said the only good reason I could see. Since I could see no
other good reasons, the statement was accurate.

Yes, and I said that sounds like hyperbole, which is also an
accurate statement.

Post by s***@casperkitty.com
You presumably see other ways in which disallowing zero-size
arrays improves the language. Care to enlighten me?

I stated that the only good reason I could see was to facilitate an
assertion construct. You stated I wasn't thinking hard enough.
That would suggest that there are other reasons which you regard as
obvious and should presumably have no trouble stating, and yet you
continually refuse to state them. And I'm the one being needlessly
argumentative!?

Tim Rentsch

2017-05-27 13:15:12 UTC