Discussion:
[libvirt] gnulib tests in libvirt broken by newer glibc 2.26
Eric Blake
2017-09-27 22:25:24 UTC
Permalink
[adding gnulib]
Hi,
there seems to be an incompatibility to the last glibc due to [1].
Gnulib needs to be updated to track the glibc changes (it looks like
that is actually under way already), then libvirt needs to pick up the
updated gnulib. I'll leave the rest of your email intact in case it
helps gnulib readers, but I'll probably help resolve the issue when I
get a chance.

I'm assuming you hit this failure on rawhide, as that is the platform
most likely to be using bleeding-edge glibc with the getopt changes?
Eventually this breaks gnulib unittests (and maybe more).
Debugging went from an assert, to bidngin different symbols, to changed
function names to different header resolution.
Because it expects it to behave "posixly" but arguments are changed
differently.
FAIL: test-getopt-posix
=======================
../../../../gnulib/tests/test-getopt.h:754: assertion 'strcmp (argv[1],
"donald") == 0' failed
# get and build latest libvirt to get the local gnulib
$ wget http://libvirt.org/sources/libvirt-3.7.0.tar.xz
$ tar xf libvirt-3.7.0.tar.xz
$ cd libvirt
$ apt build-dep libvirt
$ ./autogen.sh
$ make -j4
You can run the following simplified test derived from the unit test (it is
much shorter, so easier to debug e.g. in -E).
$ cat << EOF >> test1.c
#include <config.h>
#include <unistd.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
return 0;
}
EOF
$ gcc -I ./gnulib/lib -I . test1.c -H
You can see in -H output already the difference in the paths of the headers
Glibc <2.26
. ./config.h
.. ./config-post.h
. /usr/include/unistd.h
[...]
.. /usr/include/getopt.h
Glibc >=2.26
. ./config.h
.. ./config-post.h
. ./gnulib/lib/unistd.h
[...]
... /usr/include/x86_64-linux-gnu/bits/getopt_posix.h
.... /usr/include/x86_64-linux-gnu/bits/getopt_core.
If you build with -E you'll also see that it now uses getopt from glibc
instead of the prefixed rpl_getopt from gnulib.
It behaves as if it would not find gnulib and instead fall back to glibc.
But it finds gnulib, instead due to glibc's changes its implementation is
fetched before and due to that pulling in glibc's behavior while the unit
test is verifying against the one it expects from gnulib.
Sorry, but I don't see the right fix here yet - I could easily silence the
test but that is no fix. Especially if there might be implications due to
handling the args (slightly) differently.
I really wanted to come up with the same test against gnulib alone, but I
was lost in the build system for too long and could not get the example to
fail without libvirt (OTOH I'm sure it would).
Therefore I'm reaching out to you for your help and experience on the build
system what could be done.
[1]: https://sourceware.org/ml/libc-alpha/2017-04/msg00115.html
--
libvir-list mailing list
https://www.redhat.com/mailman/listinfo/libvir-list
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Christian Ehrhardt
2017-09-28 12:05:35 UTC
Permalink
Post by Eric Blake
[adding gnulib]
Hi,
there seems to be an incompatibility to the last glibc due to [1].
Gnulib needs to be updated to track the glibc changes (it looks like
that is actually under way already),
Yeah that seems to be part of the effort I linked.
Post by Eric Blake
then libvirt needs to pick up the
updated gnulib.
I copied in current gnulib from git and it didn't resolve yet. But maybe it
is just something that still is work in progress.
Post by Eric Blake
I'll leave the rest of your email intact in case it
helps gnulib readers, but I'll probably help resolve the issue when I
get a chance.
I'm assuming you hit this failure on rawhide, as that is the platform
most likely to be using bleeding-edge glibc with the getopt changes?
It was Ubuntu 17.10 (Artful) which is on 2.26
Post by Eric Blake
Eventually this breaks gnulib unittests (and maybe more).
Debugging went from an assert, to bidngin different symbols, to changed
function names to different header resolution.
Because it expects it to behave "posixly" but arguments are changed
differently.
FAIL: test-getopt-posix
=======================
../../../../gnulib/tests/test-getopt.h:754: assertion 'strcmp (argv[1],
"donald") == 0' failed
# get and build latest libvirt to get the local gnulib
$ wget http://libvirt.org/sources/libvirt-3.7.0.tar.xz
$ tar xf libvirt-3.7.0.tar.xz
$ cd libvirt
$ apt build-dep libvirt
$ ./autogen.sh
$ make -j4
You can run the following simplified test derived from the unit test (it
is
much shorter, so easier to debug e.g. in -E).
$ cat << EOF >> test1.c
#include <config.h>
#include <unistd.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
return 0;
}
EOF
$ gcc -I ./gnulib/lib -I . test1.c -H
You can see in -H output already the difference in the paths of the
headers
Glibc <2.26
. ./config.h
.. ./config-post.h
. /usr/include/unistd.h
[...]
.. /usr/include/getopt.h
Glibc >=2.26
. ./config.h
.. ./config-post.h
. ./gnulib/lib/unistd.h
[...]
... /usr/include/x86_64-linux-gnu/bits/getopt_posix.h
.... /usr/include/x86_64-linux-gnu/bits/getopt_core.
If you build with -E you'll also see that it now uses getopt from glibc
instead of the prefixed rpl_getopt from gnulib.
It behaves as if it would not find gnulib and instead fall back to glibc.
But it finds gnulib, instead due to glibc's changes its implementation is
fetched before and due to that pulling in glibc's behavior while the unit
test is verifying against the one it expects from gnulib.
Sorry, but I don't see the right fix here yet - I could easily silence
the
test but that is no fix. Especially if there might be implications due to
handling the args (slightly) differently.
I really wanted to come up with the same test against gnulib alone, but I
was lost in the build system for too long and could not get the example
to
fail without libvirt (OTOH I'm sure it would).
Therefore I'm reaching out to you for your help and experience on the
build
system what could be done.
[1]: https://sourceware.org/ml/libc-alpha/2017-04/msg00115.html
--
libvir-list mailing list
https://www.redhat.com/mailman/listinfo/libvir-list
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
--
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd
Christian Ehrhardt
2017-09-28 12:31:19 UTC
Permalink
On Thu, Sep 28, 2017 at 2:05 PM, Christian Ehrhardt <
Post by Eric Blake
Post by Eric Blake
[adding gnulib]
[...]
then libvirt needs to pick up the
Post by Eric Blake
updated gnulib.
I copied in current gnulib from git and it didn't resolve yet. But maybe
it is just something that still is work in progress.
Until there is a final and better solution is available the following patch
avoids the issue without fully skipping the test for now.
Since the issue seems only to apply to getopt, but not getopt_long and the
code doesn't use that in anything except examples it might be a valid
interim helper until glibc/gnulib sorted out how that is handled more
correctly.
--
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd
Paul Eggert
2017-09-28 23:41:37 UTC
Permalink
That patch essentially negates the point of the test, which is that
getopt should be visible from unistd.h. I'd rather fix the problem than
nuke the test.

Could you explain what the Gnulib problem is here? I can't really see it
in your email. A self-contained example would help.

For what it's worth, I could not reproduce the problem on Fedora 26 by
doing this in Gnulib (this tells 'configure' to use Gnulib-supplied
getopt.h and getopt.c):

./gnulib-tool --create-testdir --dir foo getopt-posix
cd foo
./configure gl_cv_func_getopt_posix=no ac_cv_header_getopt_h=no
make
make check
Daniel P. Berrange
2017-09-29 08:45:14 UTC
Permalink
That patch essentially negates the point of the test, which is that getopt
should be visible from unistd.h. I'd rather fix the problem than nuke the
test.
Could you explain what the Gnulib problem is here? I can't really see it in
your email. A self-contained example would help.
For what it's worth, I could not reproduce the problem on Fedora 26 by doing
this in Gnulib (this tells 'configure' to use Gnulib-supplied getopt.h and
Fedora 26 only has glibc 2.25 - you need to have Fedora rawhide to get
the broken behaviour, as that has glibc 2.26.90
./gnulib-tool --create-testdir --dir foo getopt-posix
cd foo
./configure gl_cv_func_getopt_posix=no ac_cv_header_getopt_h=no
make
make check
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Christian Ehrhardt
2017-09-29 12:02:20 UTC
Permalink
Post by Eric Blake
Post by Paul Eggert
That patch essentially negates the point of the test, which is that
getopt
Post by Paul Eggert
should be visible from unistd.h. I'd rather fix the problem than nuke the
test.
Could you explain what the Gnulib problem is here? I can't really see it
in
Post by Paul Eggert
your email. A self-contained example would help.
For what it's worth, I could not reproduce the problem on Fedora 26 by
doing
Post by Paul Eggert
this in Gnulib (this tells 'configure' to use Gnulib-supplied getopt.h
and
Fedora 26 only has glibc 2.25 - you need to have Fedora rawhide to get
the broken behaviour, as that has glibc 2.26.90
Post by Paul Eggert
./gnulib-tool --create-testdir --dir foo getopt-posix
cd foo
./configure gl_cv_func_getopt_posix=no ac_cv_header_getopt_h=no
make
make check
As Daniel said at least glibc 2.26 as in Fedora rawhide or Ubuntu Artful.
But thanks for these commands that way I could reproduce without needing
any of the libvirt build env.

Here [1] a log of your commands on such a system showing the issue.
I added a gdb to show the assert and an LD_DEBUG so you can see that getopt
is not taken from gnulib as the test assumes when verifying behavior.

[1]: http://paste.ubuntu.com/25638847/
Post by Eric Blake
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/
dberrange :|
|: https://libvirt.org -o-
https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/
dberrange :|
--
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd
Paul Eggert
2017-09-29 19:44:53 UTC
Permalink
Post by Christian Ehrhardt
Here [1] a log of your commands on such a system showing the issue.
Thanks, but I still don't understand what the bug is. With those
commands, the test programs use Gnulib-supplied getopt, not the glibc
getopt. So why would any change in glibc 2.26 getopt affect things?
Bruno Haible
2017-09-30 00:23:10 UTC
Permalink
Hi,
Post by Christian Ehrhardt
Post by Daniel P. Berrange
Fedora 26 only has glibc 2.25 - you need to have Fedora rawhide to get
the broken behaviour, as that has glibc 2.26.90
As Daniel said at least glibc 2.26 as in Fedora rawhide or Ubuntu Artful.
This tip is not helpful: I spent two hours trying Fedora Rawhide, and the
version of today is not functional: the live ISO shows a desktop in which
all you can do is to reboot, and the installation ISO does an installation
but the installed system does not boot.

So we are back to the support mode where we ask for pieces of information
for analysis by us (gnulib developers).
Post by Christian Ehrhardt
./configure gl_cv_func_getopt_posix=no ac_cv_header_getopt_h=no
This mode of configuration is not the usual one. (Why, if you are on a glibc
system, would you want to avoid the system's <getopt.h> header and getopt()
function?) But anyway, it should be supported, and the result should be
a 'test-getopt-posix' program that uses gnulib's replacement:

$ nm test-getopt-posix | grep getopt
000000000060a160 b getopt_data
0000000000405b90 T _getopt_internal_r
0000000000400f90 t getopt_loop.constprop.0
0000000000406170 T rpl_getopt
0000000000406110 T rpl_getopt_internal
0000000000401130 t test_getopt
Post by Christian Ehrhardt
[1]: http://paste.ubuntu.com/25638847/
Please provide info as mail attachments (compressed if necessary) in the
future. This site provides a "download as text" button which requests the
user's Launchpad credentials and then attempts to transmit them without
encryption (at least that's what I understand from the Firefox warning).
Post by Christian Ehrhardt
Here [1] a log of your commands on such a system showing the issue.
I added a gdb to show the assert
Thanks. The essential lines are:

checking for getopt.h... (cached) no
checking whether getopt is POSIX compatible... (cached) no
...
{ echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \
sed -e 's|@''GUARD_PREFIX''@|GL|g' \
-e 's|@''HAVE_GETOPT_H''@|0|g' \
-e 's|@''INCLUDE_NEXT''@|include_next|g' \
-e 's|@''PRAGMA_SYSTEM_HEADER''@|#pragma GCC system_header|g' \
-e 's|@''PRAGMA_COLUMNS''@||g' \
-e 's|@''NEXT_GETOPT_H''@|<getopt.h>|g' \
-e '/definition of _GL_ARG_NONNULL/r ./arg-nonnull.h' \
< ./getopt.in.h; \
} > getopt.h-t && \
...
gcc -DHAVE_CONFIG_H -I. -I.. -DGNULIB_STRICT_CHECKING=1 -g -O2 -MT getopt.o -MD -MP -MF .deps/getopt.Tpo -c -o getopt.o getopt.c
...
gcc -DHAVE_CONFIG_H -I. -I.. -DGNULIB_STRICT_CHECKING=1 -g -O2 -MT getopt1.o -MD -MP -MF .deps/getopt1.Tpo -c -o getopt1.o getopt1.c

So, the build attempts to use the gnulib override of getopt.

More essential lines:

Program received signal SIGABRT, Aborted.
#0 __GI_raise (sig=***@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff7a2df5d in __GI_abort () at abort.c:90
#2 0x0000555555558d23 in test_getopt () at test-getopt.h:754
#3 0x0000555555554b24 in main () at test-getopt-main.h:65

So, the test is executing test_getopt () with the variable POSIXLY_CORRECT
unset; but test_getopt () executes code that is conditional on 'posixly' being
true. Which means that __GETOPT_PREFIX must be defined (presumably to 'rpl_').

Can you show three things, please?

1) The output of
$ nm test-getopt-posix | grep getopt

2) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E test-getopt-posix.c

3) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E -dM test-getopt-posix.c

Thanks.

Bruno
Daniel P. Berrange
2017-10-06 09:10:17 UTC
Permalink
Post by Bruno Haible
Hi,
Post by Christian Ehrhardt
Post by Daniel P. Berrange
Fedora 26 only has glibc 2.25 - you need to have Fedora rawhide to get
the broken behaviour, as that has glibc 2.26.90
As Daniel said at least glibc 2.26 as in Fedora rawhide or Ubuntu Artful.
This tip is not helpful: I spent two hours trying Fedora Rawhide, and the
version of today is not functional: the live ISO shows a desktop in which
all you can do is to reboot, and the installation ISO does an installation
but the installed system does not boot.
So we are back to the support mode where we ask for pieces of information
for analysis by us (gnulib developers).
Post by Christian Ehrhardt
./configure gl_cv_func_getopt_posix=no ac_cv_header_getopt_h=no
This mode of configuration is not the usual one. (Why, if you are on a glibc
system, would you want to avoid the system's <getopt.h> header and getopt()
function?) But anyway, it should be supported, and the result should be
$ nm test-getopt-posix | grep getopt
000000000060a160 b getopt_data
0000000000405b90 T _getopt_internal_r
0000000000400f90 t getopt_loop.constprop.0
0000000000406170 T rpl_getopt
0000000000406110 T rpl_getopt_internal
0000000000401130 t test_getopt
This is what I see when building on glib < 2.26 (ie historically it
works as expected)

[snip]
Post by Bruno Haible
Can you show three things, please?
From my own F28 rawhide install with glibc-2.26.90-16.fc28.x86_64
Post by Bruno Haible
1) The output of
$ nm test-getopt-posix | grep getopt
$ nm test-getopt-posix | grep getopt
U getopt@@GLIBC_2.2.5
0000000000400ab0 t getopt_loop.constprop.0
0000000000400c50 t test_getopt
Post by Bruno Haible
2) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E test-getopt-posix.c
Attached in the file 'question-2.txt'
Post by Bruno Haible
3) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E -dM test-getopt-posix.c
Attached in the file 'question-3.txt'

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Bruno Haible
2017-10-06 17:59:13 UTC
Permalink
Post by Daniel P. Berrange
From my own F28 rawhide install with glibc-2.26.90-16.fc28.x86_64
Post by Bruno Haible
1) The output of
$ nm test-getopt-posix | grep getopt
$ nm test-getopt-posix | grep getopt
0000000000400ab0 t getopt_loop.constprop.0
0000000000400c50 t test_getopt
Post by Bruno Haible
2) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E test-getopt-posix.c
Attached in the file 'question-2.txt'
Post by Bruno Haible
3) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E -dM test-getopt-posix.c
Attached in the file 'question-3.txt'
Thanks. From this, I can see that getopt-pfx-core.h does not get included
at all. Please try this patch (or, if you don't want to re-run configure,
just remove the "&& !defined __GLIBC__" from the generated unistd.h).


diff --git a/lib/unistd.in.h b/lib/unistd.in.h
index 748112f..b5b6e0e 100644
--- a/lib/unistd.in.h
+++ b/lib/unistd.in.h
@@ -134,9 +134,8 @@
/* The definition of _GL_WARN_ON_USE is copied here. */


-/* Get getopt(), optarg, optind, opterr, optopt.
- But avoid namespace pollution on glibc systems. */
-#if @GNULIB_UNISTD_H_GETOPT@ && !defined __GLIBC__ && !defined _GL_SYSTEM_GETOPT
+/* Get getopt(), optarg, optind, opterr, optopt. */
+#if @GNULIB_UNISTD_H_GETOPT@ && !defined _GL_SYSTEM_GETOPT
# include <getopt-cdefs.h>
# include <getopt-pfx-core.h>
#endif
Daniel P. Berrange
2017-10-09 10:07:42 UTC
Permalink
Post by Bruno Haible
Post by Daniel P. Berrange
From my own F28 rawhide install with glibc-2.26.90-16.fc28.x86_64
Post by Bruno Haible
1) The output of
$ nm test-getopt-posix | grep getopt
$ nm test-getopt-posix | grep getopt
0000000000400ab0 t getopt_loop.constprop.0
0000000000400c50 t test_getopt
Post by Bruno Haible
2) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E test-getopt-posix.c
Attached in the file 'question-2.txt'
Post by Bruno Haible
3) The output of
$ gcc -DHAVE_CONFIG_H -DGNULIB_STRICT_CHECKING=1 -I. -I.. -I../gllib -g -O2 -E -dM test-getopt-posix.c
Attached in the file 'question-3.txt'
Thanks. From this, I can see that getopt-pfx-core.h does not get included
at all. Please try this patch (or, if you don't want to re-run configure,
just remove the "&& !defined __GLIBC__" from the generated unistd.h).
diff --git a/lib/unistd.in.h b/lib/unistd.in.h
index 748112f..b5b6e0e 100644
--- a/lib/unistd.in.h
+++ b/lib/unistd.in.h
@@ -134,9 +134,8 @@
/* The definition of _GL_WARN_ON_USE is copied here. */
-/* Get getopt(), optarg, optind, opterr, optopt.
- But avoid namespace pollution on glibc systems. */
+/* Get getopt(), optarg, optind, opterr, optopt. */
# include <getopt-cdefs.h>
# include <getopt-pfx-core.h>
#endif
Tested-by: Daniel P. Berrange <***@redhat.com>


Confirmed it fixes the failure on Fedora 28, and does not cause a regression
on Fedora 26 with older glibc.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Bruno Haible
2017-10-09 14:34:26 UTC
Permalink
Post by Daniel P. Berrange
Confirmed it fixes the failure on Fedora 28, and does not cause a regression
on Fedora 26 with older glibc.
Thanks. Pushing it:


2017-10-09 Bruno Haible <***@clisp.org>

getopt-posix: Fix build failure when using ac_cv_header_getopt_h=no.
Reported by Christian Ehrhardt <***@canonical.com>
and Daniel P. Berrange <***@redhat.com>.
* lib/unistd.in.h (getopt): Don't attempt to avoid namespace pollution
on glibc systems. The getopt-pfx-core.h file declares exactly what
unistd.h needs, nothing more.

diff --git a/lib/unistd.in.h b/lib/unistd.in.h
index 748112f..b5b6e0e 100644
--- a/lib/unistd.in.h
+++ b/lib/unistd.in.h
@@ -134,9 +134,8 @@
/* The definition of _GL_WARN_ON_USE is copied here. */


-/* Get getopt(), optarg, optind, opterr, optopt.
- But avoid namespace pollution on glibc systems. */
-#if @GNULIB_UNISTD_H_GETOPT@ && !defined __GLIBC__ && !defined _GL_SYSTEM_GETOPT
+/* Get getopt(), optarg, optind, opterr, optopt. */
+#if @GNULIB_UNISTD_H_GETOPT@ && !defined _GL_SYSTEM_GETOPT
# include <getopt-cdefs.h>
# include <getopt-pfx-core.h>
#endif

Loading...