d***@gmail.com
2015-04-12 19:28:43 UTC
Arguments (array_ptr=NULL, size=0) for library char array functions (memcpy, memcpy, memmove, ...) are technically forbidden (see reference to standard in the end of message)
Seems to be missed corner case, not intentional.
GCC 4.9 starts to exploit this for optimization ( https://gcc.gnu.org/gcc-4.9/porting_to.html ), so this technical detail about undefined behaviour with {array_ptr=NULL, size=0} has become not theoretical issue, but very important in practise.
Practically, it has become anti-optimization, due redundant checks {if(size) memcpy(dst,src,size)}. This 'if(size)' already handled inside memcpy algorithms. And some projects are forced to avoid optimization about deduced null/non-nulll pointers for whole project, not only memcpy ( http://blog.mycre.ws/articles/bind-and-gcc-49/ ), so this optimization gives a contrary effect.
Sure, it is useful for optimization purposes to deduce that dst/src pointers are not NULL from algorithm {for(size_t i=0;i<size;++i)dst[i]=src[i]}, but only when size!=0. When size is 0, this conclusion is not true.
It's surprising, that algorithm written as C trivial implementation should have subtle difference from library function, and used only for additional reasoning about arguments, but not for algorithm itself.
It's surprising, that copy algorithm is tightly bound to additional effects.
Can this be issued as a defect report?
Proposed addition: allow some class of invalid pointers (one-past-the end of array, NULL pointers) to be passed to library array functions, if size of array is zero.
Current implementations of memcpy functions already likely handle this case.
I had checked glibc x86_x64 implementation, ( ./sysdeps/x86_64/memcpy.S - https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/memcpy.S;h=d6cd553a266c56c0dca3f00bf5e1dcf071b57b9f;hb=HEAD )
for memcpy(dst, src, n), rdi=dst, rsi=src, rdx=n
For n smaller than 32 byte this functions checks rdx bitpattern, and if all bits are zero, explicitly exits function ( see L(1d): / andl $0xf0, %edx / jz L(exit))
=======================================
Related excerpts from WG14/N1124 - Committee Draft - May 6, 2005 - ISO/IEC 9899:TC2:
String function conventions
7.21.1/2
Where an argument declared as size_t n specifies the length of the array for a
function, n can have the value zero on a call to that function. Unless explicitly stated
otherwise in the description of a particular function in this subclause, pointer arguments
on such a call shall still have valid values, as described in 7.1.4. On such a call, a
function that locates a character finds no occurrence, a function that compares two
character sequences returns zero, and a function that copies characters copies zero
characters
7.1.4 Use of library functions
7.1.4/1
Each of the following statements applies unless explicitly stated otherwise in the detailed
descriptions that follow: If an argument to a function has an invalid value (such as a value
outside the domain of the function, or a pointer outside the address space of the program,
or a null pointer, or a pointer to non-modifiable storage when the corresponding
parameter is not const-qualified) or a type (after promotion) not expected by a function
with variable number of arguments, the behavior is undefined. If a function argument is
described as being an array .......
=======================================
May be related discussion:
https://groups.google.com/forum/m/#!searchin/comp.std.c/memcpy/comp.std.c/QkQkcvqfYKQ
https://groups.google.com/forum/m/#!searchin/comp.std.c/memcpy/comp.std.c/XLKwhmB8L5s
Seems to be missed corner case, not intentional.
GCC 4.9 starts to exploit this for optimization ( https://gcc.gnu.org/gcc-4.9/porting_to.html ), so this technical detail about undefined behaviour with {array_ptr=NULL, size=0} has become not theoretical issue, but very important in practise.
Practically, it has become anti-optimization, due redundant checks {if(size) memcpy(dst,src,size)}. This 'if(size)' already handled inside memcpy algorithms. And some projects are forced to avoid optimization about deduced null/non-nulll pointers for whole project, not only memcpy ( http://blog.mycre.ws/articles/bind-and-gcc-49/ ), so this optimization gives a contrary effect.
Sure, it is useful for optimization purposes to deduce that dst/src pointers are not NULL from algorithm {for(size_t i=0;i<size;++i)dst[i]=src[i]}, but only when size!=0. When size is 0, this conclusion is not true.
It's surprising, that algorithm written as C trivial implementation should have subtle difference from library function, and used only for additional reasoning about arguments, but not for algorithm itself.
It's surprising, that copy algorithm is tightly bound to additional effects.
Can this be issued as a defect report?
Proposed addition: allow some class of invalid pointers (one-past-the end of array, NULL pointers) to be passed to library array functions, if size of array is zero.
Current implementations of memcpy functions already likely handle this case.
I had checked glibc x86_x64 implementation, ( ./sysdeps/x86_64/memcpy.S - https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/memcpy.S;h=d6cd553a266c56c0dca3f00bf5e1dcf071b57b9f;hb=HEAD )
for memcpy(dst, src, n), rdi=dst, rsi=src, rdx=n
For n smaller than 32 byte this functions checks rdx bitpattern, and if all bits are zero, explicitly exits function ( see L(1d): / andl $0xf0, %edx / jz L(exit))
=======================================
Related excerpts from WG14/N1124 - Committee Draft - May 6, 2005 - ISO/IEC 9899:TC2:
String function conventions
7.21.1/2
Where an argument declared as size_t n specifies the length of the array for a
function, n can have the value zero on a call to that function. Unless explicitly stated
otherwise in the description of a particular function in this subclause, pointer arguments
on such a call shall still have valid values, as described in 7.1.4. On such a call, a
function that locates a character finds no occurrence, a function that compares two
character sequences returns zero, and a function that copies characters copies zero
characters
7.1.4 Use of library functions
7.1.4/1
Each of the following statements applies unless explicitly stated otherwise in the detailed
descriptions that follow: If an argument to a function has an invalid value (such as a value
outside the domain of the function, or a pointer outside the address space of the program,
or a null pointer, or a pointer to non-modifiable storage when the corresponding
parameter is not const-qualified) or a type (after promotion) not expected by a function
with variable number of arguments, the behavior is undefined. If a function argument is
described as being an array .......
=======================================
May be related discussion:
https://groups.google.com/forum/m/#!searchin/comp.std.c/memcpy/comp.std.c/QkQkcvqfYKQ
https://groups.google.com/forum/m/#!searchin/comp.std.c/memcpy/comp.std.c/XLKwhmB8L5s