FYI: dfa: add an assertion to avoid coverity false positive

Discussion:

Jim Meyering

2016-12-14 06:49:16 UTC

It took me a few minutes to convince myself that a coverity warning
was unwarranted, so I've added an assert that should suppress it.

If someone sees a better way (i.e., with no assertion), please suggest a patch.

Paul Eggert

2016-12-14 21:46:24 UTC

Permalink

Post by Jim Meyering
It took me a few minutes to convince myself that a coverity warning
was unwarranted, so I've added an assert that should suppress it.

I looked at it for a bit and found a way to justify some sort of warning
in that area, although the failure is extremely unlikely. In
realloc_trans_if_necessary, if new_state equals PTRDIFF_MAX - 1 then
'new_state + 2' overflows, yielding undefined behavior. This can happen
on weird platforms where PTRDIFF_MAX is much less than SIZE_MAX (e.g.,
32-bit ptrdiff_t and 64-bit size_t).

Admittedly I don't know of any such platforms. Still, over time I am
becoming more inclined to like the Emacs model, where object counts are
typically kept as nonnegative but signed integers. This approach makes C
code a bit more reliable, as compiling with -fsanitize=undefined is more
likely to catch integer overflow errors in index calculations (a real
problem nowadays). The Emacs allocators check that counts fit into both
ptrdiff_t and size_t (the latter for safety when dealing with low-level
functions that use size_t). Although the Emacs approach cannot allocate
char buffers with more than PTRDIFF_MAX bytes, in some sense this is a
good thing since pointer subtraction does not work properly with those
buffers anyway. Besides, dfa.c's state_num is already using ptrdiff_t
for object counts, so it's not significantly restricting dfa.c if we
change it to use ptrdiff_t in some more places.

To make a long story short, I installed the attached patch, and hope it
pacifies Coverity without the need for that assertion.

Bruno Haible

2016-12-14 22:56:28 UTC

Permalink

Hi Paul,

Post by Paul Eggert
over time I am
becoming more inclined to like the Emacs model, where object counts are
typically kept as nonnegative but signed integers. This approach makes C
code a bit more reliable, as compiling with -fsanitize=undefined is more
likely to catch integer overflow errors in index calculations (a real
problem nowadays).

Are you saying that -fsanitize=undefined or -fsanitize=signed-integer-overflow
(or -ftrapv, when using an older GCC) can detect integer overflow for signed
integers, whereas no such option exists and won't exist for unsigned integers
(because there are so many pieces of code that intentionally do a mod-2^32
or mod-2^64 computation on unsigned integers?

And what about the gnulib 'xsize' module for checked size_t computations?

Bruno

Eric Blake

2016-12-14 23:05:20 UTC

Permalink

Post by Bruno Haible
Hi Paul,

The C standard states that there is no such thing as ill-behaved integer
overflow in unsigned numbers: operations are well-defined to wraparound
modulo 2^width. It is only signed integers where there is
implementation-defined behavior (whether that behavior is wraparound,
saturation, raising a trap, or something else), and thus where there is
implementation leeway for implementing flags like -ftrapv for
controlling that behavior in a beneficial manner.

It is still possible to compute whether an unsigned operation would
overflow, you just can't convince the compiler to do it for you, so it
becomes more verbose.

Post by Bruno Haible
And what about the gnulib 'xsize' module for checked size_t computations?

As long as the manual checks are correct, it should be fine.

--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Paul Eggert

2016-12-15 00:26:23 UTC

Permalink

Post by Bruno Haible
Are you saying that -fsanitize=undefined or -fsanitize=signed-integer-overflow
(or -ftrapv, when using an older GCC) can detect integer overflow for signed
integers, whereas no such option exists and won't exist for unsigned integers
(because there are so many pieces of code that intentionally do a mod-2^32
or mod-2^64 computation on unsigned integers?

Yes, and as Eric says, arithmetic is well defined for unsigned integers,
so long as they don't fit in 'int'.

Post by Bruno Haible
what about the gnulib 'xsize' module for checked size_t computations?

That module should work if INT_MAX < SIZE_MAX, which is a safe
assumption as lots of code would break on a platform where SIZE_MAX <=
INT_MAX. I suppose it wouldn't hurt to add 'verify (INT_MAX <
SIZE_MAX);' to document the assumption.

Oh -- perhaps you're asking whether it's a good idea to use xsum with
size_t rather than native addition with ptrdiff_t. Although xsize.h is
portable and nicely maps size overflows to memory-allocation errors, it
has some downsides. With xsize, it's a bit harder to read xsum calls
than simple uses of +. WIth xsize, programming errors are more likely to
not get caught by -fsanitize=undefined. But perhaps the biggest issue is
that xsize does not prevent errors due to sizes greater than
PTRDIFF_MAX. Although this problem is not unique to xsize (lots of
gnulib and GNU core utilities have it) it's a shame that xsize doesn't
prevent it, since the problem is in xsize's wheelhouse.

Come to think of it, I suppose we should change xalloc_oversized to
report an overflow if the resulting size would be greater than
PTRDIFF_MAX. That should catch more potential problems in Gnulib and in
Gnulib-using code.

Here is an example of why arrays larger than PTRDIFF_MAX bytes can cause
real problems. On a 32-bit host, the program below calls malloc (2**31),
because n == 2**30 and sizeof (short) is 2. Assuming malloc succeeds,
one might naively think that diff returns 2**30 (which does fit in
ptrdiff_t) and that 'main' therefore returns 1. But this does not
happen, at least not on my platform (Fedora 24, gcc 6.2.1). When
compiled with gcc -O2, 'main' returns zero without allocating anything!
'main' can do so because subtracting two pointers that differ by more
than PTRDIFF_MAX bytes results in undefined behavior, so GCC can
generate whatever code it wants.

#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>

ptrdiff_t
diff (short *a, short *b)
{
return a - b;
}

int
main (void)
{
size_t n = PTRDIFF_MAX / sizeof (short) + 1;
short *x = malloc (n * sizeof (short));
return 0 < diff (x + n, x);
}

Paul Eggert

2016-12-15 01:11:53 UTC

Permalink

I suppose we should change xalloc_oversized to report an overflow if
the resulting size would be greater than PTRDIFF_MAX. That should
catch more potential problems in Gnulib and in Gnulib-using code.

Attached is a proposed patch to do that.

Bruno Haible

2016-12-15 02:17:29 UTC

Permalink

Post by Paul Eggert
Come to think of it, I suppose we should change xalloc_oversized to
report an overflow if the resulting size would be greater than
PTRDIFF_MAX. That should catch more potential problems in Gnulib and in
Gnulib-using code.
...
Here is an example of why arrays larger than PTRDIFF_MAX bytes can cause
real problems.

So, an attempt to allocate a memory chunk of 2.5 GB size on 32-bit Linux/x86
is ill-fated already? That is, SIZE_MAX = 4 GB - 1 gives us a fake illusion
that allocations > 2 GB would be OK, and in fact they are not OK, so it's
in fact ssize_t or ptrdiff_t that matters?!

Bruno

Bruno Haible

2016-12-15 10:09:49 UTC

Permalink

Post by Paul Eggert
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
ptrdiff_t
diff (short *a, short *b)
{
return a - b;
}
int
main (void)
{
size_t n = PTRDIFF_MAX / sizeof (short) + 1;
short *x = malloc (n * sizeof (short));
return 0 < diff (x + n, x);
}

I can reproduce it, on Ubuntu with "gcc -m32", even with a 2.5 GB allocation:
size_t n = PTRDIFF_MAX / sizeof (short) * 1.25;
'ltrace' shows that malloc succeeds.

The code of the 'diff' function shows a signed shift:

diff:
movl 4(%esp), %eax
subl 8(%esp), %eax
sarl %eax
ret

And indeed, an unsigned shift is not possible here because the compiler
doesn't know whether to expect a >= b or a <= b.

So, the limiting factor is the pointer difference operator
ptr1 - ptr2 where sizeof (*ptr1,*ptr2) > 1.

Consequences:

* We have no problem with code that only works with indices and never does
pointer differences or pointer comparisons.

for (i = 0; i < n; i++)
do_something (&array[i]);

is better than

array_end = &array_end;
for (p = array; p < array_end; p++)
do_something (p);

But does GCC's strength-reduction optimization know this?

* We have no problem with strings, because sizeof (char) == 1.

Bruno

Paul Eggert

2016-12-15 18:21:40 UTC

Permalink

Post by Bruno Haible
So, the limiting factor is the pointer difference operator
ptr1 - ptr2 where sizeof (*ptr1,*ptr2) > 1.

Yes, it is the pointer difference operator. However, the problem occurs
even with size-1 array elements. For example:

#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>

ptrdiff_t
diff (char *a, char *b)
{
return a - b;
}

int
main (void)
{
size_t n = PTRDIFF_MAX / 2 + 1;
size_t size = 2 * n;
char *x = malloc (size);
return 0 < diff (x + size, x);
}

'main' returns 0 on Fedora 24 (x86-64 or x86).

Post by Bruno Haible
* We have no problem with code that only works with indices and never does
pointer differences or pointer comparisons.

I don't see a problem with pointer comparisons, just pointer differences.

Post by Bruno Haible
* We have no problem with strings, because sizeof (char) == 1.

No, unfortunately large strings do not work, as one cannot reliably
compute differences of pointers to their elements.

One possibility would be to have two flavors of xalloc_oversized. One
flavor would check for both ptrdiff_t overflow and size_t overflow, for
programs that do pointer subtraction, and the other flavor
(yalloc_oversized, say?) would check only for size_t overflow, for
programs that never subtract pointers to the allocated storage. All
current functions like xnmalloc could have two flavors, so that xnmalloc
checks for both kinds of overflow and ynmalloc checks only for size_t
overflow. It's not clear to me whether it's worth going to all that
effort merely to support 3 GiB arrays in 32-bit applications. In the
meantime, I installed the patch I proposed yesterday, along with the
additional patches attached, which merely change the x* functions to
check for both kinds of overflow.

Bruno Haible

2016-12-15 23:55:09 UTC

Permalink

Hi Paul,

Post by Paul Eggert
I installed the patch I proposed yesterday, along with the
additional patches attached, which merely change the x* functions to
check for both kinds of overflow.

These changes give me some stomach-ache. I perfectly understand that
you're making a departure from 20 years of C tradition, to get overflows
detected better. I see three issues:

1) You're basically saying "let's use signed integer types for indices",
and you do that in the quotearg.c change.

1.1) The declaration 'unsigned int i;' served also as an information for
the programmer: "Attention, never assign negative values to this variable!"
Changing that to "int i;" hampers maintainability.
This could be solved through a typedef
typedef int index_t; /* keep always >= 0 */
but such a typedef doesn't solve 1.2).

1.2) Essentially, the only remaining use of 'unsigned int' is for packed
bit arrays and bit masks, and for multi-precision arithmetic. But as C
programmer I do want to have integer variables which are always >= 0,
<= PTRDIFF_MAX, <= SIZE_MAX, _and_ I would like to have them checked
against overflow.

For this purpose, it would be good if GCC had a type, say, __gcc_index_t,
that -fsanitize=undefined will make produce a diagnostic is a value < 0
or > PTRDIFF_MAX is assigned.

2) The type __xalloc_count_type is sometimes signed, sometimes unsigned,
depending on platform (like 'char' and 'wchar_t').
Such types produce extra trouble:
* Some bugs can occur only one kind of platform and cannot be reproduced
even with the best intrumentation (-fsanitize=undefined, valgrind, etc.)
on the other kind of platforms.
* Code which is necessary for one platform produces warnings on the
other platforms. Such as:
wchar_t wc = ...;
if (wc >= 0x0000 && wc < 0x110000)
This produces a warning on platforms where wchar_t is unsigned.

3) The type defined in xalloc-oversized.h has much wider applicability:

#if PTRDIFF_MAX < SIZE_MAX
typedef ptrdiff_t __xalloc_count_type;
#else
typedef size_t __xalloc_count_type;
#endif

It becomes one of the basic C types and should therefore deserve a
name with wider scope. Possibly 'gl_index_t' or such.

Bruno

Bruno Haible

2016-12-16 00:02:18 UTC

Permalink

Post by Bruno Haible
For this purpose, it would be good if GCC had a type, say, __gcc_index_t,
that -fsanitize=undefined will make produce a diagnostic is a value < 0
or > PTRDIFF_MAX is assigned.

Actually, this is a special case of a range type. If we could have Ada's range
types [1] in C, with overflow check enabled by -ftrapv or -fsanitize=undefined,
that would be a *great* improvement. Especially as many GNU packages use
C as an application programming language rather than as a system programming
language.

Bruno

[1] https://en.wikibooks.org/wiki/Ada_Programming/Types/range

Paul Eggert

2016-12-16 07:35:40 UTC

Permalink

Post by Bruno Haible
1) You're basically saying "let's use signed integer types for indices",
and you do that in the quotearg.c change.

Yes. This is not my invention; it's common in C programs generally to use int
for indexes. Using ptrdiff_t for indexes is the preferred coding style in the C
core of Emacs, where the maintainers don't want to use unsigned integers except
for specialized uses like bitmasks. (ptrdiff_t is obviously a safer choice than
int.)

Post by Bruno Haible
2) The type __xalloc_count_type is sometimes signed, sometimes unsigned,
depending on platform (like 'char' and 'wchar_t').

True, though it is signed on all practical platforms that I know about. The
unsigned alternative is in some sense merely a hypothetical one.

Post by Bruno Haible
It becomes one of the basic C types and should therefore deserve a
name with wider scope.

I'm not sure I'd go that far. Emacs simply used ptrdiff_t for indexes, and this
works well. ptrdiff_t works for all values that xalloc_oversized accepts.
_xalloc_count_type exists merely to support efficient checking for oversized
values even on weird platforms where SIZE_MAX < PTRDIFF_MAX, and this
quite-specialized use doesn't need to leak out into applications -- at least,
I've never felt the need for it in Emacs.

Paul Eggert

2016-12-14 23:14:26 UTC

Permalink

I somehow managed to install the wrong version. Sorry about that. Should
be fixed by the attached.