Post by Ben Bacarisse<snip>
Post by Keith ThompsonHere's a demo program, based on your declaration above with the syntax
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
typedef struct string_header_s {
short length; /* limited length */
char text[ 1 ];
} string_header_t;
string_header_t *p = malloc(sizeof *p + 10); /* more than we need */
if (p == NULL) exit(EXIT_FAILURE);
p->length = 4;
strcpy(p->text, "abcd");
printf("p->text[3] = '%c'\n", p->text[3]);
}
When I compile and run this on my system, I get no compile-time
p->text[3] = 'd'
But both the strcpy() call and the reference to p->text[3] have
undefined behavior. Concentrating on the latter, the indexing
operator [] is defined in terms of pointer addition, in this case
(p->text + 3). The description of pointer addition, in N1570
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.
The array object in question is p->text, which is of type char[1].
We've used malloc to allocate additional memory past the end of
that array object, but it's not part of the array object.
I wasn't sure where to jump into this thread so here seems
as good a place as any.
Post by Ben BacarisseI think something might be missing from the standard in the area. I
int *ip = malloc(40); // [earlier size bug corrected]
if (ip) ip[3] = 42;
but what is the array in question here? The storage allocated by malloc
has no effective type (yet) so how can ip and ip+3 point to elements of
the same array? (I've used and int * because there are special
dispensations for accessing any objects byte as a sequence of chars.)
Assuming 'sizeof (int) == 4', this malloc call returns a pointer
to a space that can be used as an array of 10 ints. Given that,
the index operation, and subsequent assignment operation, have
defined behavior. Asking "what is the array" is not important,
because the relevant passage for malloc(), etc, gives the right
to use the returned pointer value in the same way that a pointer
to the start of an array could be used.
Post by Ben Bacarisse/*[previously]*/ string_header_t *p = malloc(sizeof *p + 10);
p->text[3] = 'x'; // invalid
char *cp = (void *)p;
cp += offsetof(sting_header_t, text);
cp[3] = 'x'; // this one OK?
I'd say yes
In my view there is no question that the behavior involving 'cp'
is defined, actually for two different reasons. The more subtle
reason I will discuss below. The more obvious reason is that for
any object, including the implied array object pointed to by 'p',
it's always allowed to convert a pointer to the start of the
object to a pointer-to-character type, and access the object as
an array of bytes. (Some people might quibble about having to
increment pointers one-by-one to access the characters, but I'm
going to ignore that since it doesn't pass the laugh test.)
Surely the key provision in the first paragraph regarding memory
management functions, ie, the one that says the returned pointer
from malloc() and friends may be
used to access such an object or an array of such objects in
the space allocated
is meant to include the ability to access the same space as a
character array. The operations on 'cp' are allowed under that
umbrella.
Post by Ben Bacarisseand I think my answer would stay the same if the 'text'
member and the pointer access where to int types rather than char.
(You'd need a char * to do the first offsetof addition, of course.)
Here we get to the heart of the matter, and the more subtle
reasoning alluded to above. Again I believe the behavior
is defined (with a very small asterisk covering a DS9000
type implementation, as explained further below).
Suppose first, as will most commonly be true, that the offset of
the int member in question is a multiple of sizeof (int). We are
allowed to take the return value of malloc() and convert it to
an 'int *', and use that to access an array of ints. Casting the
pointer 'p' to (void *) gives that same value back (please no
quibbles about "same" versus "equal"), which lets us use that
value to access an array of ints. So something like this
int *ip = (void *)p;
ip += offsetof(string_header_t, ints) / sizeof (int);
ip[3] = 42;
has to work. (Using a char * initially and using a byte offset
to increment that pointer is not an important difference.)
Let me say this again more directly, to convey the impact of what
the Standard says about malloc() values. We may do this:
void *v = malloc( 10000 );
int *ip = v;
long *lp = v;
float *fp = v;
string_header_t *shp = v;
after which /all/ of these pointers may be used to access arrays
of their respective types. As long as we don't run afoul of
effective type rules, pointers to space returned by malloc() may
be freely mixed and matched, intermingled, converted to char *
and adjusted by byte lengths, etc, and everything works. In
effect, the type of space returned by malloc() is a union over
/all possible types/ that will fit in the space, including array
types. Treating part of the space one way and another part a
different way is allowed in all cases (again, assuming no
interference from effective type rules or struct/union type
overlap). It is this broad guarantee that gives workaround
code like that shown above defined behavior.
Now for the asterisk. Suppose the byte offset for the "struct
hack" member (of type int[1]) is not a multiple of sizeof(int).
This means an int array starting at the beginning of the malloc()
space doesn't do the job for us. Normally this won't be a
problem, since in addition to
struct { short s; int i1[1]; }
there will be types like
struct { short s; int i10[10]; }
struct { short s; int i100[100]; }
struct { short s; int i1000[1000]; }
with the same offset for the int array members as that of the
first "struct hack" type. However, a perverse implementation
could choose a different offset for the extent 1 case and all
cases with extent greater than 1, which means in principle the
workaround access scheme would give undefined behavior on such
implementations. Not counting such far-fetched scenarios however
the behavior is defined.
(It occurs to me now I should mention one other thing, namely,
storing into a member of a struct gives the freedom to put
indeterminate values into bytes of the struct that don't
correspond to other members. So if there is padding after the
"struct hack" member, and a struct member is stored into after
the SH member array has been set (in those locations), then that
is also a potential conflict. Probably not likely to make a
difference, but I try to be thorough.)