[std-discussion] Value of padding bytes

Discussion:

Hyman Rosen

2017-11-22 16:56:29 UTC

Does the following code have undefined behavior (when there are padding
bytes)?

I would say no, for a variety of reasons - objects may be accessed through
char glvalues,
the bytes of an object can be copied out and copied back in, the C++ memory
model is
intended to be compatible with C, etc., but I don't see an explicit
statement in the Standard.

Tools like valgrind complain about accessing uninitialized data in cases
like this too, which
is annoying.

struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

2017-11-22 17:59:37 UTC

Permalink

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?
I would say no,

If it doesn't have UB, then what *is* its behavior? What is the return
value of that function? If you can't point to a place in the standard that
explains what that will be, then the behavior by definition has to be
undefined.

for a variety of reasons - objects may be accessed through char glvalues,

Post by Hyman Rosen
the bytes of an object can be copied out and copied back in,

That doesn't mean you get to look at what you're copying around. What the
standard says is that if you copy the object representation of `X`
somewhere, and then copy it back into some other `X`, then the other `X`
will have the same value representation as the original `X` that you copied.

The standard does not define what the value of those bytes are. Only that
these bytes, when assembled in order within an object `X`, will cause it
that object to assume the value of the original `X`.

the C++ memory model is

Post by Hyman Rosen
intended to be compatible with C, etc., but I don't see an explicit
statement in the Standard.
Tools like valgrind complain about accessing uninitialized data in cases
like this too, which
is annoying.

As well they should. That memory has not been uninitialized. Aggregate
initialization initializes the *value representation* of an object. The
bytes of an object's object representation which are not associated with
its value representation do not have well-defined values. They have *some*
value, but not a defined value.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-22 19:57:30 UTC

Permalink

Post by Nicol Bolas
If it doesn't have UB, then what *is* its behavior? What is the return
value of that function? If you can't point to a place in the standard that
explains what that will be, then the behavior by definition has to be
undefined.

I believe that is false.

[intro.execution] says
*Certain other operations are described in this International Standard as
undefined (for example, the effect of*
*attempting to modify a const object). [ Note: This International Standard
imposes no requirements on the*
*behavior of programs that contain undefined behavior. âend note ]*

I understand that to mean that things are not undefined unless the standard
says they're undefined.
Undefined behavior is opt-in, not opt-out.

for a variety of reasons - objects may be accessed through char glvalues,

Post by Nicol Bolas

Post by Hyman Rosen
the bytes of an object can be copied out and copied back in,

That's nonsense. Where does the standard imply permission to copy but not
look?
What would that even mean? I can write a copy routine that does

void my_copy(char *to, char *from) {
switch (*from) {
case 0: *to = 0; break;
case 1: *to = 1; break;
...
}
}

By what evidence from the Standard would this be illegal?

Post by Nicol Bolas
The standard does not define what the value of those bytes are. Only that
these bytes, when assembled in order within an object `X`, will cause it
that object to assume the value of the original `X`.
the C++ memory model is

They don't need to have well-defined values in order to make looking at
them legal,
any more than the values returned by a clock or random-number generator have
well-defined values.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Ville Voutilainen

2017-11-22 20:19:18 UTC

Permalink

Post by Hyman Rosen

If it doesn't have UB, then what is its behavior? What is the return value
of that function? If you can't point to a place in the standard that
explains what that will be, then the behavior by definition has to be
undefined.

I believe that is false.
[intro.execution] says
Certain other operations are described in this International Standard as
undefined (for example, the effect of
attempting to modify a const object). [ Note: This International Standard
imposes no requirements on the
behavior of programs that contain undefined behavior. —end note ]
I understand that to mean that things are not undefined unless the standard
says they're undefined.
Undefined behavior is opt-in, not opt-out.

Nice try, but completely incorrect. The definition of UB is

undefined behavior
behavior for which this document imposes no requirements

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-22 21:13:02 UTC

Permalink

On Wed, Nov 22, 2017 at 3:19 PM, Ville Voutilainen <

Post by Ville Voutilainen
Nice try, but completely incorrect. The definition of UB is
undefined behavior
behavior for which this document imposes no requirements

In the C++ draft Standard document N4659, notice [atomics.types.operations].

The class template std::atomic<T> can be instantiated with trivially
copyable types T.
The compare_exchange_*...* member functions are defined to compare the
*memory* of
of the objects, with a Note showing that this is done as if by *memcmp*.
There is a
further note specifically warning that these *memcmp* semantics may cause
failed
comparisons for underlying types with padding bits.

Thus, even though Notes are not normative, this is certainly a strong
indication that
the Standard does not regard comparison of padding as undefined behavior.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Ville Voutilainen

2017-11-22 21:29:54 UTC

Permalink

Post by Hyman Rosen
On Wed, Nov 22, 2017 at 3:19 PM, Ville Voutilainen

Post by Ville Voutilainen
Nice try, but completely incorrect. The definition of UB is
undefined behavior
behavior for which this document imposes no requirements

In the C++ draft Standard document N4659, notice [atomics.types.operations].
The class template std::atomic<T> can be instantiated with trivially
copyable types T.
The compare_exchange_... member functions are defined to compare the memory
of
of the objects, with a Note showing that this is done as if by memcmp.
There is a
further note specifically warning that these memcmp semantics may cause
failed
comparisons for underlying types with padding bits.
Thus, even though Notes are not normative, this is certainly a strong
indication that
the Standard does not regard comparison of padding as undefined behavior.

You might want to take a look at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0528r0.html.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-22 22:43:07 UTC

Permalink

On Wed, Nov 22, 2017 at 4:29 PM, Ville Voutilainen <

Post by Ville Voutilainen
You might want to take a look at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0528r0.html.

The paper proposes to disallow compare_exchange_... on types with padding
bits because the results are problematic due to the unknown nature of those
bits. It does not say that those bits cannot be examined (and indeed, their
sample program does so).

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Thiago Macieira

2017-11-22 18:09:17 UTC

Permalink

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?

With one modification, I'd say it's indeterminate behaviour, not UB. The value
of padding bytes is unspecified, but it is valid to access it, so it can't be
UB.

Post by Hyman Rosen
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

Replace "char" with "unsigned char".

On systems where "char" is signed, the value there could be a trap
representation.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Thiago Macieira

2017-11-22 18:14:12 UTC

Permalink

Post by Thiago Macieira
With one modification, I'd say it's indeterminate behaviour, not UB. The
value of padding bytes is unspecified, but it is valid to access it, so it
can't be UB.

Post by Hyman Rosen
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

After reading Nicol's reply, I changed my mind. The above is UB.

You *can* access bytes with unspecified values, but you can't take conditions
on them. That's what that == 0 does.

I'm not sure that returning an unspecified value is considered a condition or
not. I would suppose not, since one could implement using:

unsigned char byteAt(const void *ptr, ptrdiff_t offset)
{ return reinterpret_cast<unsigned char *>(ptr)[offset]; }
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Myriachan

2017-11-22 18:48:58 UTC

Permalink

Post by Thiago Macieira
After reading Nicol's reply, I changed my mind. The above is UB.
You *can* access bytes with unspecified values, but you can't take conditions
on them. That's what that == 0 does.
I'm not sure that returning an unspecified value is considered a condition or
unsigned char byteAt(const void *ptr, ptrdiff_t offset)
{ return reinterpret_cast<unsigned char *>(ptr)[offset]; }

Why is it undefined behavior to do comparisons on unspecified values? This
is something that seems broken to me in the current C and C++ standard. It
really ought to be the case that unspecified values result in unpredictable
results of the comparison operator, not demons flying out of your nose.

Melissa

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

2017-11-22 19:01:52 UTC

Permalink

Post by Myriachan

Why is it undefined behavior to do comparisons on unspecified values?

Because the unspecified value may not be a legal value for the type, and
therefore comparison is not defined for it.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Thiago Macieira

2017-11-22 19:16:09 UTC

Permalink

Post by Nicol Bolas

Post by Myriachan

Post by Thiago Macieira
I'm not sure that returning an unspecified value is considered a condition or
unsigned char byteAt(const void *ptr, ptrdiff_t offset)
{ return reinterpret_cast<unsigned char *>(ptr)[offset]; }

Why is it undefined behavior to do comparisons on unspecified values?

I believe the theoretical answer is that we can't define what the behaviour
would be if we don't know what the value is. We can say that the unspecified
value can be copied around and that if you put it back where it was, the
object will have remained unmodified. This allows memcpy to exist.

The practical answer comes from IA-64 and those Not-a-Thing bits. When you
declare a variable and don't initialise it, the variable may have been
allocated to a register that is NaT'ed. You can copy it around and even spill
to memory (using a special instruction suitably named st8.spill), but you
cannot make comparisons on it or your program will suffer a "NaT consumption
fault".

But even on IA-64, the ABI says that registers used for either parameters or
returned values must not be NaT.

Post by Nicol Bolas
Because the unspecified value may not be a legal value for the type, and
therefore comparison is not defined for it.

Is that true specifically for std::byte / unsigned char? By definition, there
are no illegal values for this type.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-22 20:07:39 UTC

Permalink

Post by Thiago Macieira
I believe the theoretical answer is that we can't define what the behaviour
would be if we don't know what the value is. We can say that the unspecified
value can be copied around and that if you put it back where it was, the
object will have remained unmodified. This allows memcpy to exist.

Note that the C++ Standard does not require the use of memcpy to do the
copying.
In all cases referring to copying bytes, the Standard says things like "for
example, by
using std::memcpy."

The practical answer comes from IA-64 and those Not-a-Thing bits. When you

Post by Thiago Macieira
declare a variable and don't initialise it, the variable may have been
allocated to a register that is NaT'ed. You can copy it around and even spill
to memory (using a special instruction suitably named st8.spill), but you
cannot make comparisons on it or your program will suffer a "NaT consumption
fault".

That's irrelevant. We're not talking about uninitialized variables, we're
talking about
initialized ones. Given struct X { char c; double d; } x = { 'a', 1.1 };,
x is an initialized variable. If the hardware requires it, then the
compiler has to arrange
that the padding bytes be marked as initialized, or else you need to find
the wording
in the Standard that allows what you claim.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-22 20:00:51 UTC

Permalink

Post by Nicol Bolas

Post by Myriachan
Why is it undefined behavior to do comparisons on unspecified values?

Because the unspecified value may not be a legal value for the type, and
therefore comparison is not defined for it.

That is specifically false for char. [basic.fundamental] says
*A char, a signed char, and an unsigned char occupy the same amount of
storage and have the*
*same alignment requirements (6.11); that is, they have the same object
representation. For narrow character*
*types, all bits of the object representation participate in the value
representation.*

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-22 19:58:08 UTC

Permalink

Post by Thiago Macieira
You *can* access bytes with unspecified values, but you can't take conditions
on them. That's what that == 0 does.

By what evidence from the Standard is this the case?

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Thiago Macieira

2017-11-23 17:20:48 UTC

Permalink

Post by Hyman Rosen

Post by Thiago Macieira
You *can* access bytes with unspecified values, but you can't take conditions
on them. That's what that == 0 does.

By what evidence from the Standard is this the case?

See Chris's reply.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-24 16:23:37 UTC

Permalink

Post by Thiago Macieira

Post by Hyman Rosen

Post by Thiago Macieira
You *can* access bytes with unspecified values, but you can't take conditions
on them. That's what that == 0 does.

By what evidence from the Standard is this the case?

See Chris's reply.

I disagree. My x is an object for which initialization has been performed,
and
therefore the rest of the quoted section does not apply.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

2017-11-24 16:54:34 UTC

Permalink

Post by Hyman Rosen

<javascript:>>

Post by Hyman Rosen

Post by Thiago Macieira
You *can* access bytes with unspecified values, but you can't take conditions
on them. That's what that == 0 does.

By what evidence from the Standard is this the case?

See Chris's reply.

I disagree. My x is an object for which initialization has been
performed, and
therefore the rest of the quoted section does not apply.

But you're not interacting with `x`. You're interacting with an array of
bytes. Some of those byte objects were initialized and some of them were
not.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-24 17:38:39 UTC

Permalink

Post by Nicol Bolas
But you're not interacting with `x`. You're interacting with an array of
bytes.

Some of those byte objects were initialized and some of them were not.
I'm interacting with the object representation of an initialized object.

In any case, I believe it is nonsense to say that it can be defined behavior
to copy a byte but undefined behavior to examine its value; the standard
does not define what it means to "copy," and as I posted before, I don't see
why code like the following would not be a "copy" operation:

void copy(char *to, char *from, size_t n) {
while (n-- > 0) {
switch (*from++) {
case 0: *to++ = 0; break;
case 1: *to++ = 1; break;
//...
}
}
}

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

2017-11-24 21:09:40 UTC

Permalink

Post by Hyman Rosen

Post by Nicol Bolas
But you're not interacting with `x`. You're interacting with an array of
bytes.

Some of those byte objects were initialized and some of them were not.
I'm interacting with the object representation of an initialized object.

And I don't see the part of the standard that allows you to "interact" with
it. Only to copy it.

In any case, I believe it is nonsense to say that it can be defined behavior

Post by Hyman Rosen
to copy a byte but undefined behavior to examine its value;

And yet, here we are.

the standard

Post by Hyman Rosen
does not define what it means to "copy,"

Just like the standard does not define what it means for one value to be
"less than" another, yet we still understand it. Words mean what they mean,
unless otherwise stated.

However, if you want to get pedantic, "copy"ing an object is well-defined.
The code you've posted is most assuredly not copying it.

and as I posted before, I don't see

Post by Hyman Rosen
void copy(char *to, char *from, size_t n) {
while (n-- > 0) {
switch (*from++) {
case 0: *to++ = 0; break;
case 1: *to++ = 1; break;
//...
}
}
}

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Chris Hallock

2017-11-24 17:32:05 UTC

Permalink

Post by Hyman Rosen

<javascript:>>

Post by Hyman Rosen

Post by Thiago Macieira
You *can* access bytes with unspecified values, but you can't take conditions
on them. That's what that == 0 does.

By what evidence from the Standard is this the case?

See Chris's reply.

I disagree. My x is an object for which initialization has been
performed, and
therefore the rest of the quoted section does not apply.

struct X { int a; X(){} };
int main() { X x; }

Surely you'd agree that x gets initialized, and x.a does not?

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-24 17:34:21 UTC

Permalink

On Fri, Nov 24, 2017 at 12:32 PM, Chris Hallock <

Post by Hyman Rosen
I disagree. My x is an object for which initialization has been

Post by Hyman Rosen
performed, and
therefore the rest of the quoted section does not apply.

struct X { int a; X(){} };
int main() { X x; }
Surely you'd agree that x gets initialized, and x.a does not?

This is not an aggregate initialization. Aggregate initializers initialize
the entire aggregate.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Chris Hallock

2017-11-24 19:42:01 UTC

Permalink

Post by Hyman Rosen

Post by Hyman Rosen
I disagree. My x is an object for which initialization has been

Post by Hyman Rosen
performed, and
therefore the rest of the quoted section does not apply.

struct X { int a; X(){} };
int main() { X x; }
Surely you'd agree that x gets initialized, and x.a does not?

This is not an aggregate initialization. Aggregate initializers
initialize the entire aggregate.

...except for unnamed bit-fields. "Unnamed bit-fields are not members and
cannot be initialized" ([class.bit]/2
<http://eel.is/c++draft/class.bit#2.sentence-2>). This is a divergence from
C.

CWG appears to want to treat padding bytes in the same fashion. "The
consensus was that unnamed bit-fields do constitute padding; more
generally, padding should be normatively defined, along the lines suggested
in 9.2.4 [class.bit] paragraphs 1-2." (notes from open issue 2253
<https://wg21.link/cwg2253>).

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

2017-11-24 21:09:01 UTC

Permalink

Post by Hyman Rosen

Post by Hyman Rosen
I disagree. My x is an object for which initialization has been

Post by Hyman Rosen
performed, and
therefore the rest of the quoted section does not apply.

struct X { int a; X(){} };
int main() { X x; }
Surely you'd agree that x gets initialized, and x.a does not?

This is not an aggregate initialization. Aggregate initializers
initialize the entire aggregate.

It initializes the aggregate by initializing the *members* of the aggregate

Post by Hyman Rosen
When an aggregate is initialized by an initializer list as specified in

11.6.4, the elements of the initializer list are taken as initializers for
the elements of the aggregate, in order.

Padding bytes are not "elements of the aggregate". Therefore, they are not
initialized.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Tony V E

2017-11-22 23:08:12 UTC

Permalink

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?
I would say no, for a variety of reasons - objects may be accessed through
char glvalues,
the bytes of an object can be copied out and copied back in, the C++
memory model is
intended to be compatible with C, etc., but I don't see an explicit
statement in the Standard.
Tools like valgrind complain about accessing uninitialized data in cases
like this too, which
is annoying.
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

I can't quote the standard, but I believe this is meant to be valid
(although I'd prefer unsigned char or std::byte).
You can look at the underlying bytes of an object representation. Things
like "object representation" are defined in the standard, and they are
defined in terms of unsigned char, basically. And unsigned char is a value
you can read, always.

Yet, those bytes you are reading probably are uninitialized (assuming usual
alignment). So valgrind is complaining correctly.

The compiler can assume that second byte of x is 42 if it wants, return
false, and throw away the rest of the code.

But it can't change its mind unless x changes:

// can't fail
assert(reinterpret_cast<char*>(&x)[1] ==
reinterpret_cast<char*>(&x)[1]);

But

char was = reinterpret_cast<char*>(&x)[1];
x.c = 'a'; // not even a different value!
char now = reinterpret_cast<char*>(&x)[1];

// can fail:
assert(was == now);
--
Be seeing you,
Tony

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Tony V E

2017-11-22 23:11:15 UTC

Permalink

Post by Tony V E

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?
I would say no, for a variety of reasons - objects may be accessed
through char glvalues,
the bytes of an object can be copied out and copied back in, the C++
memory model is
intended to be compatible with C, etc., but I don't see an explicit
statement in the Standard.
Tools like valgrind complain about accessing uninitialized data in cases
like this too, which
is annoying.
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

I can't quote the standard, but I believe this is meant to be valid
(although I'd prefer unsigned char or std::byte).
You can look at the underlying bytes of an object representation. Things
like "object representation" are defined in the standard, and they are
defined in terms of unsigned char, basically. And unsigned char is a value
you can read, always.
Yet, those bytes you are reading probably are uninitialized (assuming
usual alignment). So valgrind is complaining correctly.
The compiler can assume that second byte of x is 42 if it wants, return
false, and throw away the rest of the code.
// can't fail
assert(reinterpret_cast<char*>(&x)[1] ==
reinterpret_cast<char*>(&x)[1]);

Hmmm, the more I think about that line, I'm not even sure whether it needs
to hold. I wonder about:

X x = { 'a', 1.1 };
X y;
memcpy(&y,&x,sizeof(X));
assert(memcmp(&x,&y,sizeof(X)); // ??

Post by Tony V E
But
char was = reinterpret_cast<char*>(&x)[1];
x.c = 'a'; // not even a different value!
char now = reinterpret_cast<char*>(&x)[1];
assert(was == now);
--
Be seeing you,
Tony

--
Be seeing you,
Tony

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Chris Hallock

2017-11-22 23:10:40 UTC

Permalink

Presuming that padding bytes are objects, I believe your example is
undefined. I think this is unfortunate, because the equivalent code in C is
well-defined, albeit with an unspecified 0-or-1 return value.

In C, storing a value in an object of struct type sets padding bytes to
unspecified values (6.2.6.1/6), and unspecified values in C are valid,
non-trapping values (3.19.3).

In C++, padding bytes are not really well-described. If they are objects,
then padding bytes have *indeterminate values* per [dcl.init]/12
<http://eel.is/c++draft/dcl.init#12> if the struct wasn't zero-initialized.
Indeterminate values can only be used in the limited ways described in
[dcl.init]/12, and comparison is right out.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Thiago Macieira

2017-11-25 00:06:40 UTC

Permalink

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?
I would say no, for a variety of reasons - objects may be accessed through
char glvalues,
the bytes of an object can be copied out and copied back in, the C++ memory
model is
intended to be compatible with C, etc., but I don't see an explicit
statement in the Standard.
Tools like valgrind complain about accessing uninitialized data in cases
like this too, which
is annoying.
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

I think we should ask the question:

Why do you care? What are you trying to do that reduced to the case above?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Richard Hodges

2017-11-25 17:19:08 UTC

Permalink

Restating the question in terms of memcpy to an array:

I don't think the program below is in any way undefined behaviour.
It is perfectly (implementation) defined how many hex values will appear on
stdout.
The values of the bytes themselves is implementation defined or undefined,
depending on the byte, but they are values nonetheless. They must be. They
were created on line 13 and initialised on line 14. It's unequivocal.

The fact that we can memcpy an object to data and then back in a defined
way must implicitly imply that all bytes in the object X are validly
readable and writeable.
It seems to me that they cannot be trap values otherwise memcpy could not
work.

#include <cstring>
#include <type_traits>
#include <memory>
#include <iostream>
#include <iomanip>

struct X { char c; double d; };

int main() {
X x = { 'a', 1.1 };

unsigned char byte_buffer[sizeof(X)];
std::memcpy(byte_buffer, std::addressof(x), sizeof(x));

for(auto uc : byte_buffer)
{
std::cout << std::hex << std::setw(2) << std::setfill('0') <<
unsigned(uc) << ", ";
}
}

example output:

61, 00, 00, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f,

link: http://coliru.stacked-crooked.com/a/408e699c9fc98afd

The original version, using a reinterpret_cast seems to me to be strictly
undefined behaviour because X "is not a" char [] (yes I know it is really,
but in compiler memory model doublespeak it's conceptually not).
Since the compiler will see you doing something UBish it's within its
rights to produce surprising code.

Using reinterpret_cast to cast a pointer-to-Thing to
pointer-to-SomethingElse is UB if you access the SomethingElse, no?

As a parting opinion, I think it's perfectly ok to define the c++ language
this way, but I really do think the standard should compel compilers to
emit a diagnostic in the case of unequivocal UB like this (ideally with a
suggestion of the correct approach).

Otherwise, this bytes-or-objects argument is set to run and run.

Post by Hyman Rosen

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?
I would say no, for a variety of reasons - objects may be accessed

through

Post by Hyman Rosen
char glvalues,
the bytes of an object can be copied out and copied back in, the C++

memory

Post by Hyman Rosen
model is
intended to be compatible with C, etc., but I don't see an explicit
statement in the Standard.
Tools like valgrind complain about accessing uninitialized data in cases
like this too, which
is annoying.
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

Why do you care? What are you trying to do that reduced to the case above?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Richard Hodges

2017-11-25 17:29:29 UTC

Permalink

Sorry for the spam-post.

Interestingly, refactoring the program to use the UB method does indeed
produce surprising output (gcc, -O3):

#include <cstring>
#include <type_traits>
#include <memory>
#include <iostream>
#include <iomanip>

struct X { char c; double d; };

int main() {
X x = { 'a', 1.1 };

unsigned char* byte_buffer = reinterpret_cast<unsigned char
*>(std::addressof(x));

auto first = byte_buffer, last = byte_buffer + sizeof(x);
for( ; first != last ; ++first )
{
std::cout << std::hex << std::setw(2) << std::setfill('0') <<
unsigned(*first) << ", ";
}
}

example output:

61, 09, 40, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f,
^^ ^^ !!!

Post by Richard Hodges
I don't think the program below is in any way undefined behaviour.
It is perfectly (implementation) defined how many hex values will appear
on stdout.
The values of the bytes themselves is implementation defined or undefined,
depending on the byte, but they are values nonetheless. They must be. They
were created on line 13 and initialised on line 14. It's unequivocal.
The fact that we can memcpy an object to data and then back in a defined
way must implicitly imply that all bytes in the object X are validly
readable and writeable.
It seems to me that they cannot be trap values otherwise memcpy could not
work.
#include <cstring>
#include <type_traits>
#include <memory>
#include <iostream>
#include <iomanip>
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
unsigned char byte_buffer[sizeof(X)];
std::memcpy(byte_buffer, std::addressof(x), sizeof(x));
for(auto uc : byte_buffer)
{
std::cout << std::hex << std::setw(2) << std::setfill('0') <<
unsigned(uc) << ", ";
}
}
61, 00, 00, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f,
link: http://coliru.stacked-crooked.com/a/408e699c9fc98afd
The original version, using a reinterpret_cast seems to me to be strictly
undefined behaviour because X "is not a" char [] (yes I know it is really,
but in compiler memory model doublespeak it's conceptually not).
Since the compiler will see you doing something UBish it's within its
rights to produce surprising code.
Using reinterpret_cast to cast a pointer-to-Thing to
pointer-to-SomethingElse is UB if you access the SomethingElse, no?
As a parting opinion, I think it's perfectly ok to define the c++ language
this way, but I really do think the standard should compel compilers to
emit a diagnostic in the case of unequivocal UB like this (ideally with a
suggestion of the correct approach).
Otherwise, this bytes-or-objects argument is set to run and run.

Post by Hyman Rosen

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?
I would say no, for a variety of reasons - objects may be accessed

through

Post by Hyman Rosen
char glvalues,
the bytes of an object can be copied out and copied back in, the C++

memory

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Tony V E

2017-11-25 17:37:03 UTC

Permalink

<html><head></head><body lang="en-US" style="background-color: rgb(255, 255, 255); line-height: initial;"> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">You are surprised the padding bytes are a different value than the other way?</div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br></div> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br style="display:initial"></div> <div style="font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Sent from my BlackBerry portable Babbage Device</div> <table width="100%" style="background-color:white;border-spacing:0px;"> <tbody><tr><td colspan="2" style="font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"> <div style="border-style: solid none none; border-top-color: rgb(181, 196, 223); border-top-width: 1pt; padding: 3pt 0in 0in; font-family: Tahoma, 'BB Alpha Sans', 'Slate Pro'; font-size: 10pt;"> <div><b>From: </b>Richard Hodges</div><div><b>Sent: </b>Saturday, November 25, 2017 12:29 PM</div><div><b>To: </b>std-***@isocpp.org</div><div><b>Reply To: </b>std-***@isocpp.org</div><div><b>Subject: </b>Re: [std-discussion] Value of padding bytes</div></div></td></tr></tbody></table><div style="border-style: solid none none; border-top-color: rgb(186, 188, 209); border-top-width: 1pt; font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"></div><br><div id="_originalContent" style=""><div dir="ltr">Sorry for the spam-post.<div><br></div><div>Interestingly, refactoring the program to use the UB method does indeed produce surprising output (gcc, -O3):</div><div><br></div><div><div><font face="monospace, monospace">#include <cstring></font></div><div><font face="monospace, monospace">#include <type_traits></font></div><div><font face="monospace, monospace">#include <memory></font></div><div><font face="monospace, monospace">#include <iostream></font></div><div><font face="monospace, monospace">#include <iomanip></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">struct X { char c; double d; };</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">int main() {</font></div><div><font face="monospace, monospace">    X x = { 'a', 1.1 };</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    unsigned char* byte_buffer = reinterpret_cast<unsigned char *>(std::addressof(x));</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    auto first = byte_buffer, last = byte_buffer + sizeof(x);</font></div><div><font face="monospace, monospace">    for( ; first != last ; ++first )</font></div><div><font face="monospace, monospace">    {</font></div><div><font face="monospace, monospace">        std::cout << std::hex << std::setw(2) << std::setfill('0') << unsigned(*first) << ", ";</font></div><div><font face="monospace, monospace">    }</font></div><div><font face="monospace, monospace">}</font></div></div><div><br></div><div>example output:</div><div><br></div><div><font face="monospace, monospace">61, 09, 40, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f, <br></font></div><div><font face="monospace, monospace">    ^^  ^^ !!!</font></div><div><font face="monospace, monospace"><br></font></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 25 November 2017 at 18:19, Richard Hodges <span dir="ltr"><<a href="mailto:***@gmail.com" target="_blank">***@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Restating the question in terms of memcpy to an array:<div><br></div><div>I don't think the program below is in any way undefined behaviour. </div><div>It is perfectly (implementation) defined how many hex values will appear on stdout. </div><div>The values of the bytes themselves is implementation defined or undefined, depending on the byte, but they are values nonetheless. They must be. They were created on line 13 and initialised on line 14. It's unequivocal.</div><div><br></div><div>The fact that we can memcpy an object to data and then back in a defined way must implicitly imply that all bytes in the object X are validly readable and writeable. </div><div>It seems to me that they cannot be trap values otherwise memcpy could not work.</div><div><br></div><div><div><font face="monospace, monospace">#include <cstring></font></div><div><font face="monospace, monospace">#include <type_traits></font></div><div><font face="monospace, monospace">#include <memory></font></div><div><font face="monospace, monospace">#include <iostream></font></div><div><font face="monospace, monospace">#include <iomanip></font></div><span class=""><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">struct X { char c; double d; };</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">int main() {</font></div><div><font face="monospace, monospace">    X x = { 'a', 1.1 };</font></div><div><font face="monospace, monospace"><br></font></div></span><div><font face="monospace, monospace">    unsigned char byte_buffer[sizeof(X)];</font></div><div><font face="monospace, monospace">    std::memcpy(byte_buffer, std::addressof(x), sizeof(x));</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    for(auto uc : byte_buffer)</font></div><div><font face="monospace, monospace">    {</font></div><div><font face="monospace, monospace">        std::cout << std::hex << std::setw(2) << std::setfill('0') << unsigned(uc) << ", ";</font></div><div><font face="monospace, monospace">    }</font></div><div><font face="monospace, monospace">}</font></div></div><div><br></div><div>example output:</div><div><br></div><div><font face="monospace, monospace">61, 00, 00, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f, <br></font></div><div><br></div><div>link: <a href="http://coliru.stacked-crooked.com/a/408e699c9fc98afd" target="_blank">http://coliru.stacked-<wbr>crooked.com/a/408e699c9fc98afd</a></div><div><br></div><div>The original version, using a reinterpret_cast seems to me to be strictly undefined behaviour because X "is not a" char [] (yes I know it is really, but in compiler memory model doublespeak it's conceptually not).</div><div>Since the compiler will see you doing something UBish it's within its rights to produce surprising code.</div><div><br></div><div>Using reinterpret_cast to cast a pointer-to-Thing to pointer-to-SomethingElse is UB if you access the SomethingElse, no?</div><div><br></div><div>As a parting opinion, I think it's perfectly ok to define the c++ language this way, but I really do think the standard should compel compilers to emit a diagnostic in the case of unequivocal UB like this (ideally with a suggestion of the correct approach). </div><div><br></div><div>Otherwise, this bytes-or-objects argument is set to run and run.</div><div><br></div><div> </div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On 25 November 2017 at 01:06, Thiago Macieira <span dir="ltr"><<a href="mailto:***@macieira.org" target="_blank">***@macieira.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On quarta-feira, 22 de novembro de 2017 08:56:29 PST Hyman Rosen wrote:<br> </span><span>> Does the following code have undefined behavior (when there are padding<br>
> bytes)?<br>
><br>
> I would say no, for a variety of reasons - objects may be accessed through<br>
> char glvalues,<br>
> the bytes of an object can be copied out and copied back in, the C++ memory<br>
> model is<br>
> intended to be compatible with C, etc., but I don't see an explicit<br>
> statement in the Standard.<br>
><br>
> Tools like valgrind complain about accessing uninitialized data in cases<br>
> like this too, which<br>
> is annoying.<br>
><br>
> struct X { char c; double d; };<br>
> int main() {<br>
>     X x = { 'a', 1.1 };<br>
>     return reinterpret_cast<char*>(&x)[1] == 0;<br>
> }<br>
<br>
</span>I think we should ask the question:<br>
<br>
Why do you care? What are you trying to do that reduced to the case above?<br>
<span class="m_-1424870224114001820im m_-1424870224114001820HOEnZb"><br>
--<br>
Thiago Macieira - thiago (AT) <a href="http://macieira.info" rel="noreferrer" target="_blank">macieira.info</a> - thiago (AT) <a href="http://kde.org" rel="noreferrer" target="_blank">kde.org</a><br>
   Software Architect - Intel Open Source Technology Center<br>
<br>
</span><div class="m_-1424870224114001820HOEnZb"><div class="m_-1424870224114001820h5">--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion%***@isocpp.org" target="_blank">std-discussion+***@iso<wbr>cpp.org</a>.<br>
To post to this group, send email to <a href="mailto:std-***@isocpp.org" target="_blank">std-***@isocpp.org</a>.<br>
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/" rel="noreferrer" target="_blank">https://groups.google.com/a/is<wbr>ocpp.org/group/std-discussion/</a><wbr>.<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

<p></p>

-- <br>
<br>
--- <br>
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org">std-discussion+***@isocpp.org</a>.<br>
To post to this group, send email to <a href="mailto:std-***@isocpp.org">std-***@isocpp.org</a>.<br>
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/">https://groups.google.com/a/isocpp.org/group/std-discussion/</a>.<br>
<br></div></body></html>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org">std-discussion+***@isocpp.org</a>.<br />
To post to this group, send email to <a href="mailto:std-***@isocpp.org">std-***@isocpp.org</a>.<br />
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/">https://groups.google.com/a/isocpp.org/group/std-discussion/</a>.<br />

Richard Hodges

2017-11-25 17:59:23 UTC

Permalink

Post by Tony V E
You are surprised the padding bytes are a different value than the other
way?

No, but it's a nice unequivocal demonstration of the UB-ness when compared
with the defined way of doing it (previous post).

Post by Tony V E
Sent from my BlackBerry portable Babbage Device
*From: *Richard Hodges
*Sent: *Saturday, November 25, 2017 12:29 PM
*Subject: *Re: [std-discussion] Value of padding bytes
Sorry for the spam-post.
Interestingly, refactoring the program to use the UB method does indeed
#include <cstring>
#include <type_traits>
#include <memory>
#include <iostream>
#include <iomanip>
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
unsigned char* byte_buffer = reinterpret_cast<unsigned char
*>(std::addressof(x));
auto first = byte_buffer, last = byte_buffer + sizeof(x);
for( ; first != last ; ++first )
{
std::cout << std::hex << std::setw(2) << std::setfill('0') <<
unsigned(*first) << ", ";
}
}
61, 09, 40, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f,
^^ ^^ !!!

Post by Richard Hodges
I don't think the program below is in any way undefined behaviour.
It is perfectly (implementation) defined how many hex values will appear
on stdout.
The values of the bytes themselves is implementation defined or
undefined, depending on the byte, but they are values nonetheless. They
must be. They were created on line 13 and initialised on line 14. It's
unequivocal.
The fact that we can memcpy an object to data and then back in a defined
way must implicitly imply that all bytes in the object X are validly
readable and writeable.
It seems to me that they cannot be trap values otherwise memcpy could not
work.
#include <cstring>
#include <type_traits>
#include <memory>
#include <iostream>
#include <iomanip>
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
unsigned char byte_buffer[sizeof(X)];
std::memcpy(byte_buffer, std::addressof(x), sizeof(x));
for(auto uc : byte_buffer)
{
std::cout << std::hex << std::setw(2) << std::setfill('0') <<
unsigned(uc) << ", ";
}
}
61, 00, 00, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f,
link: http://coliru.stacked-crooked.com/a/408e699c9fc98afd
The original version, using a reinterpret_cast seems to me to be strictly
undefined behaviour because X "is not a" char [] (yes I know it is really,
but in compiler memory model doublespeak it's conceptually not).
Since the compiler will see you doing something UBish it's within its
rights to produce surprising code.
Using reinterpret_cast to cast a pointer-to-Thing to
pointer-to-SomethingElse is UB if you access the SomethingElse, no?
As a parting opinion, I think it's perfectly ok to define the c++
language this way, but I really do think the standard should compel
compilers to emit a diagnostic in the case of unequivocal UB like this
(ideally with a suggestion of the correct approach).
Otherwise, this bytes-or-objects argument is set to run and run.

Post by Hyman Rosen

Post by Hyman Rosen
Does the following code have undefined behavior (when there are padding
bytes)?
I would say no, for a variety of reasons - objects may be accessed

through

Post by Hyman Rosen
char glvalues,
the bytes of an object can be copied out and copied back in, the C++

memory

Post by Hyman Rosen
model is
intended to be compatible with C, etc., but I don't see an explicit
statement in the Standard.
Tools like valgrind complain about accessing uninitialized data in

cases

Post by Hyman Rosen
like this too, which
is annoying.
struct X { char c; double d; };
int main() {
X x = { 'a', 1.1 };
return reinterpret_cast<char*>(&x)[1] == 0;
}

Why do you care? What are you trying to do that reduced to the case above?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.

--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Tony V E

2017-11-25 18:23:38 UTC

Permalink

<html><head></head><body lang="en-US" style="background-color: rgb(255, 255, 255); line-height: initial;"> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">That's not UB, just indeterminate. </div> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br style="display:initial"></div> <div style="font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Sent from my BlackBerry portable Babbage Device</div> <table width="100%" style="background-color:white;border-spacing:0px;"> <tbody><tr><td colspan="2" style="font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"> <div style="border-style: solid none none; border-top-color: rgb(181, 196, 223); border-top-width: 1pt; padding: 3pt 0in 0in; font-family: Tahoma, 'BB Alpha Sans', 'Slate Pro'; font-size: 10pt;"> <div><b>From: </b>Richard Hodges</div><div><b>Sent: </b>Saturday, November 25, 2017 12:59 PM</div><div><b>To: </b>std-***@isocpp.org</div><div><b>Reply To: </b>std-***@isocpp.org</div><div><b>Subject: </b>Re: [std-discussion] Value of padding bytes</div></div></td></tr></tbody></table><div style="border-style: solid none none; border-top-color: rgb(186, 188, 209); border-top-width: 1pt; font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"></div><br><div id="_originalContent" style=""><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 25 November 2017 at 18:37, Tony V E <span dir="ltr"><<a href="mailto:***@gmail.com" target="_blank">***@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="en-US" style="background-color:rgb(255,255,255);line-height:initial"> <div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">You are surprised the padding bytes are a different value than the other way?</div></div></blockquote><div><br></div><div>No, but it's a nice unequivocal demonstration of the UB-ness when compared with the defined way of doing it (previous post). </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="en-US" style="background-color:rgb(255,255,255);line-height:initial"><div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)"><br></div> <div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)"><br style="display:initial"></div> <div style="font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">Sent from my BlackBerry <wbr>portable Babbage Device</div> <table width="100%" style="background-color:white;border-spacing:0px"> <tbody><tr><td colspan="2" style="font-size:initial;text-align:initial;background-color:rgb(255,255,255)"> <div style="border-style:solid none none;border-top-color:rgb(181,196,223);border-top-width:1pt;padding:3pt 0in 0in;font-family:Tahoma,'BB Alpha Sans','Slate Pro';font-size:10pt"> <div><b>From: </b>Richard Hodges</div><div><b>Sent: </b>Saturday, November 25, 2017 12:29 PM</div><div><b>To: </b><a href="mailto:std-***@isocpp.org" target="_blank">std-***@isocpp.org</a></div><div><b>Reply To: </b><a href="mailto:std-***@isocpp.org" target="_blank">std-***@isocpp.org</a></div><div><b>Subject: </b>Re: [std-discussion] Value of padding bytes</div></div></td></tr></tbody></table><div><div class="h5"><div style="border-style:solid none none;border-top-color:rgb(186,188,209);border-top-width:1pt;font-size:initial;text-align:initial;background-color:rgb(255,255,255)"></div><br><div id="m_-4408439978304409036_originalContent"><div dir="ltr">Sorry for the spam-post.<div><br></div><div>Interestingly, refactoring the program to use the UB method does indeed produce surprising output (gcc, -O3):</div><div><br></div><div><div><font face="monospace, monospace">#include <cstring></font></div><div><font face="monospace, monospace">#include <type_traits></font></div><div><font face="monospace, monospace">#include <memory></font></div><div><font face="monospace, monospace">#include <iostream></font></div><div><font face="monospace, monospace">#include <iomanip></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">struct X { char c; double d; };</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">int main() {</font></div><div><font face="monospace, monospace">    X x = { 'a', 1.1 };</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    unsigned char* byte_buffer = reinterpret_cast<unsigned char *>(std::addressof(x));</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    auto first = byte_buffer, last = byte_buffer + sizeof(x);</font></div><div><font face="monospace, monospace">    for( ; first != last ; ++first )</font></div><div><font face="monospace, monospace">    {</font></div><div><font face="monospace, monospace">        std::cout << std::hex << std::setw(2) << std::setfill('0') << unsigned(*first) << ", ";</font></div><div><font face="monospace, monospace">    }</font></div><div><font face="monospace, monospace">}</font></div></div><div><br></div><div>example output:</div><div><br></div><div><font face="monospace, monospace">61, 09, 40, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f, <br></font></div><div><font face="monospace, monospace">    ^^  ^^ !!!</font></div><div><font face="monospace, monospace"><br></font></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 25 November 2017 at 18:19, Richard Hodges <span dir="ltr"><<a href="mailto:***@gmail.com" target="_blank">***@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Restating the question in terms of memcpy to an array:<div><br></div><div>I don't think the program below is in any way undefined behaviour. </div><div>It is perfectly (implementation) defined how many hex values will appear on stdout. </div><div>The values of the bytes themselves is implementation defined or undefined, depending on the byte, but they are values nonetheless. They must be. They were created on line 13 and initialised on line 14. It's unequivocal.</div><div><br></div><div>The fact that we can memcpy an object to data and then back in a defined way must implicitly imply that all bytes in the object X are validly readable and writeable. </div><div>It seems to me that they cannot be trap values otherwise memcpy could not work.</div><div><br></div><div><div><font face="monospace, monospace">#include <cstring></font></div><div><font face="monospace, monospace">#include <type_traits></font></div><div><font face="monospace, monospace">#include <memory></font></div><div><font face="monospace, monospace">#include <iostream></font></div><div><font face="monospace, monospace">#include <iomanip></font></div><span><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">struct X { char c; double d; };</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">int main() {</font></div><div><font face="monospace, monospace">    X x = { 'a', 1.1 };</font></div><div><font face="monospace, monospace"><br></font></div></span><div><font face="monospace, monospace">    unsigned char byte_buffer[sizeof(X)];</font></div><div><font face="monospace, monospace">    std::memcpy(byte_buffer, std::addressof(x), sizeof(x));</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    for(auto uc : byte_buffer)</font></div><div><font face="monospace, monospace">    {</font></div><div><font face="monospace, monospace">        std::cout << std::hex << std::setw(2) << std::setfill('0') << unsigned(uc) << ", ";</font></div><div><font face="monospace, monospace">    }</font></div><div><font face="monospace, monospace">}</font></div></div><div><br></div><div>example output:</div><div><br></div><div><font face="monospace, monospace">61, 00, 00, 00, 00, 00, 00, 00, 9a, 99, 99, 99, 99, 99, f1, 3f, <br></font></div><div><br></div><div>link: <a href="http://coliru.stacked-crooked.com/a/408e699c9fc98afd" target="_blank">http://coliru.stacked-cr<wbr>ooked.com/a/408e699c9fc98afd</a></div><div><br></div><div>The original version, using a reinterpret_cast seems to me to be strictly undefined behaviour because X "is not a" char [] (yes I know it is really, but in compiler memory model doublespeak it's conceptually not).</div><div>Since the compiler will see you doing something UBish it's within its rights to produce surprising code.</div><div><br></div><div>Using reinterpret_cast to cast a pointer-to-Thing to pointer-to-SomethingElse is UB if you access the SomethingElse, no?</div><div><br></div><div>As a parting opinion, I think it's perfectly ok to define the c++ language this way, but I really do think the standard should compel compilers to emit a diagnostic in the case of unequivocal UB like this (ideally with a suggestion of the correct approach). </div><div><br></div><div>Otherwise, this bytes-or-objects argument is set to run and run.</div><div><br></div><div> </div></div><div class="m_-4408439978304409036HOEnZb"><div class="m_-4408439978304409036h5"><div class="gmail_extra"><br><div class="gmail_quote">On 25 November 2017 at 01:06, Thiago Macieira <span dir="ltr"><<a href="mailto:***@macieira.org" target="_blank">***@macieira.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On quarta-feira, 22 de novembro de 2017 08:56:29 PST Hyman Rosen wrote:<br> </span><span>> Does the following code have undefined behavior (when there are padding<br>
> bytes)?<br>
><br>
> I would say no, for a variety of reasons - objects may be accessed through<br>
> char glvalues,<br>
> the bytes of an object can be copied out and copied back in, the C++ memory<br>
> model is<br>
> intended to be compatible with C, etc., but I don't see an explicit<br>
> statement in the Standard.<br>
><br>
> Tools like valgrind complain about accessing uninitialized data in cases<br>
> like this too, which<br>
> is annoying.<br>
><br>
> struct X { char c; double d; };<br>
> int main() {<br>
>     X x = { 'a', 1.1 };<br>
>     return reinterpret_cast<char*>(&x)[1] == 0;<br>
> }<br>
<br>
</span>I think we should ask the question:<br>
<br>
Why do you care? What are you trying to do that reduced to the case above?<br>
<span class="m_-4408439978304409036m_-1424870224114001820im m_-4408439978304409036m_-1424870224114001820HOEnZb"><br>
--<br>
Thiago Macieira - thiago (AT) <a href="http://macieira.info" rel="noreferrer" target="_blank">macieira.info</a> - thiago (AT) <a href="http://kde.org" rel="noreferrer" target="_blank">kde.org</a><br>
   Software Architect - Intel Open Source Technology Center<br>
<br>
</span><div class="m_-4408439978304409036m_-1424870224114001820HOEnZb"><div class="m_-4408439978304409036m_-1424870224114001820h5">--<br>
<br>
---<br>
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion%***@isocpp.org" target="_blank">std-discussion+***@iso<wbr>cpp.org</a>.<br>
To post to this group, send email to <a href="mailto:std-***@isocpp.org" target="_blank">std-***@isocpp.org</a>.<br>
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/" rel="noreferrer" target="_blank">https://groups.google.com/a/is<wbr>ocpp.org/group/std-discussion/</a><wbr>.<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

<p></p>

-- <br>
<br>
--- <br>
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org" target="_blank">std-discussion+unsubscribe@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href="mailto:std-***@isocpp.org" target="_blank">std-***@isocpp.org</a>.<br>
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/" target="_blank">https://groups.google.com/a/<wbr>isocpp.org/group/std-<wbr>discussion/</a>.<br>
<br></div></div></div></div><div class="HOEnZb"><div class="h5">

<p></p>

-- <br>
<br>
--- <br>
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org" target="_blank">std-discussion+unsubscribe@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href="mailto:std-***@isocpp.org" target="_blank">std-***@isocpp.org</a>.<br>
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/" target="_blank">https://groups.google.com/a/<wbr>isocpp.org/group/std-<wbr>discussion/</a>.<br>
</div></div></blockquote></div><br></div></div>

<p></p>

-- <br>
<br>
--- <br>
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org">std-discussion+***@isocpp.org</a>.<br>
To post to this group, send email to <a href="mailto:std-***@isocpp.org">std-***@isocpp.org</a>.<br>
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/">https://groups.google.com/a/isocpp.org/group/std-discussion/</a>.<br>
<br></div></body></html>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org">std-discussion+***@isocpp.org</a>.<br />
To post to this group, send email to <a href="mailto:std-***@isocpp.org">std-***@isocpp.org</a>.<br />
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/">https://groups.google.com/a/isocpp.org/group/std-discussion/</a>.<br />

Nicol Bolas

2017-11-25 21:21:31 UTC

Permalink

Post by Tony V E
That's not UB, just indeterminate.

But as explained, doing things with uninitialized values, except for
certain very specific things, yields undefined behavior.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-26 07:28:16 UTC

Permalink

Post by Thiago Macieira
Why do you care? What are you trying to do that reduced to the case above?

Code like this:

template <typename BitwiseCopyable>
void fill(BitwiseCopyable *to, const BitwiseCopyable &from, size_t count)
{
bool use_memset = true;
char *p = reinterpret_cast<const char *>(&from);
for (size_t i = 1; i < sizeof(BitwiseCopyable) && use_memset; ++i) {
use_memset = p[i] == p[0];
}
if (use_memset) {
memset(to, p[0], count * sizeof(BitwiseCopyable));
}
else {
for (size_t i = 0; i < count; ++i) {
memcpy(to + i, &from, sizeof(BitwiseCopyable));
}
}
}

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Richard Hodges

2017-11-26 08:33:02 UTC

Permalink

Post by Hyman Rosen

Post by Thiago Macieira
Why do you care? What are you trying to do that reduced to the case above?

While I share your disquiet about objects not really being bytes in the c++
memory model, I think this function is UB (in the sense that the standard
does not define what it should do).

The reason I say this is the use of memset.

c++'s standard basically says of memset, "see the c standard".

The c11 standard says, "The memset function copies the value of c
(converted to an unsigned char) into each of the first n characters of the
object pointed to by s."

Well, in c++, the value at s does not "have n characters" because in c++
Newspeak it's an object, not a sequence of bytes. I can't think of a way to
write this function in terms of memset that is not UB.

Post by Hyman Rosen
template <typename BitwiseCopyable>
void fill(BitwiseCopyable *to, const BitwiseCopyable &from, size_t count)
{
bool use_memset = true;
char *p = reinterpret_cast<const char *>(&from);
for (size_t i = 1; i < sizeof(BitwiseCopyable) && use_memset; ++i) {
use_memset = p[i] == p[0];
}
if (use_memset) {
memset(to, p[0], count * sizeof(BitwiseCopyable));
}
else {
for (size_t i = 0; i < count; ++i) {
memcpy(to + i, &from, sizeof(BitwiseCopyable));
}
}
}
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Richard Hodges

2017-11-26 08:49:57 UTC

Permalink

indeed, writing it in terms of std::fill actually produces better code:

template <typename BitwiseCopyable>
__attribute__((noinline))
void fill(BitwiseCopyable *to, const BitwiseCopyable &from, size_t count)
{
std::fill(to, to + count, from);
}

example code (g++, --native, -O3):

void fill<F>(F*, F const&, unsigned long):
sal rdx, 4
add rdx, rdi
cmp rdx, rdi
je .L5
.L3:
vmovdqu xmm0, XMMWORD PTR [rsi]
add rdi, 16
vmovups XMMWORD PTR [rdi-16], xmm0
cmp rdx, rdi
jne .L3
.L5:
ret

Post by Richard Hodges

Post by Hyman Rosen

Post by Thiago Macieira
Why do you care? What are you trying to do that reduced to the case above?

While I share your disquiet about objects not really being bytes in the
c++ memory model, I think this function is UB (in the sense that the
standard does not define what it should do).
The reason I say this is the use of memset.
c++'s standard basically says of memset, "see the c standard".
The c11 standard says, "The memset function copies the value of c
(converted to an unsigned char) into each of the first n characters of the
object pointed to by s."
Well, in c++, the value at s does not "have n characters" because in c++
Newspeak it's an object, not a sequence of bytes. I can't think of a way to
write this function in terms of memset that is not UB.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-26 22:10:29 UTC

Permalink

Post by Richard Hodges
While I share your disquiet about objects not really being bytes in the
c++ memory model, I think this function is UB (in the sense that the
standard does not define what it should do).
The reason I say this is the use of memset.
c++'s standard basically says of memset, "see the c standard".
The c11 standard says, "The memset function copies the value of c
(converted to an unsigned char) into each of the first n characters of the
object pointed to by s."
Well, in c++, the value at s does not "have n characters" because in c++
Newspeak it's an object, not a sequence of bytes. I can't think of a way to
write this function in terms of memset that is not UB.

I don't understand. N4659 says in [basic.types]

*For any trivially copyable type T, if two pointers to T point to distinct
T objects obj1 and obj2,where neither obj1 nor obj2 is a base-class
subobject, if the underlying bytes making up obj1are copied into obj2, obj2
shall subsequently hold the same value as obj1.*
If I have determined that all of the underlying bytes making up obj1 have
the same value,
you nevertheless claim that using memset to copy those bytes into obj2 is
not "copying"
in the sense of the above statement? That seems deliberately obtuse.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

2017-11-26 22:32:58 UTC

Permalink

Post by Hyman Rosen

I don't understand. N4659 says in [basic.types]
*For any trivially copyable type T, if two pointers to T point to distinct
T objects obj1 and obj2,where neither obj1 nor obj2 is a base-class
subobject, if the underlying bytes making up obj1are copied into obj2, obj2
shall subsequently hold the same value as obj1.*
If I have determined that all of the underlying bytes making up obj1 have
the same value,

Let's assume that nothing you did in making that evaluation constituted UB.

you nevertheless claim that using memset to copy those bytes into obj2 is

Post by Hyman Rosen
not "copying"
in the sense of the above statement?

No, it is not. The word "copy" does not mean "insert the same value". It
means "copy". That is, this is a copy:

a = b;

This is not a copy:

switch(b)
{
case 0: a = 0; break;
case 1: a = 1; break;
...
}

"The effect of a copy through non-copying means" is not the same thing as
"copy".

Post by Hyman Rosen
That seems deliberately obtuse.

No, it's precise. Words mean what they say.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Thiago Macieira

2017-11-26 19:08:49 UTC

Permalink

Post by Thiago Macieira
Why do you care? What are you trying to do that reduced to the case above?

Thanks.

But do note that your function may not use memset for an object like
{'\0', 0.0}
in some conditions, because the 3-7 padding bytes may not be zero.

So this function may or may not do your optimisation, even if there's no UB. I
don't think this is a very good use case.

And to make the problem worse, with an object of 16 bytes like the example,
the memset and memcpy will very likely be equally efficient. So in some cases,
the comparison at the beginning is not an optimisation, it's a pessimisation.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Hyman Rosen

2017-11-26 21:30:30 UTC

Permalink

Post by Thiago Macieira
Why do you care? What are you trying to do that reduced to the case above?

Thanks.

But do note that your function may not use memset for an object like
{'\0', 0.0}
in some conditions, because the 3-7 padding bytes may not be zero.

So this function may or may not do your optimisation, even if there's no
UB. I
don't think this is a very good use case.

And to make the problem worse, with an object of 16 bytes like the example,
the memset and memcpy will very likely be equally efficient. So in some
cases,
the comparison at the beginning is not an optimisation, it's a
pessimisation.

As this is the group for discussion of the Standard, the point is not
whether the code is appropriate but whether it is legal. This is similar
to code that we have had in production for years; periodically there are
complaints about valgrind warnings, and it is useful to know the legality
of the code to help us decide its future.

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Thiago Macieira

2017-11-27 03:54:34 UTC

Permalink

Post by Hyman Rosen
As this is the group for discussion of the Standard, the point is not
whether the code is appropriate but whether it is legal. This is similar
to code that we have had in production for years; periodically there are
complaints about valgrind warnings, and it is useful to know the legality
of the code to help us decide its future.

I don't believe it's legal and I do believe the Valgrind warnings are valid.

Even if your code is legal, it's a poor idea. Don't try to force a byte-level
memset() when modern CPUs can do 16- or 32-byte copy.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.