[RFC PATCH v4 3/5] support record failure: flush stdout, use _exit ()

Discussion:

[RFC PATCH v4 3/5] support record failure: flush stdout, use _exit ()

Mathieu Desnoyers

2018-11-21 18:39:34 UTC

Using "exit ()" from pthread_atfork handlers hangs the process. It is
therefore not a good way to exit when reporting a testing error.

Use _exit () instead, which directly exits the process. However,
considering that the buffered stdout is used to output test failure
messages, we first need to flush stdout before exiting.

Signed-off-by: Mathieu Desnoyers <***@efficios.com>
CC: Carlos O'Donell <***@redhat.com>
CC: Florian Weimer <***@redhat.com>
CC: Joseph Myers <***@codesourcery.com>
CC: Szabolcs Nagy <***@arm.com>
CC: libc-***@sourceware.org
---
support/check.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/support/check.c b/support/check.c
index 78f2b3cde1..bed1d20457 100644
--- a/support/check.c
+++ b/support/check.c
@@ -22,6 +22,7 @@
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
+#include <unistd.h>
#include <support/test-driver.h>

static void
@@ -56,5 +57,6 @@ support_exit_failure_impl (int status, const char *file, int line,
va_start (ap, format);
print_failure (file, line, format, ap);
va_end (ap);
- exit (status);
+ fflush (stdout);
+ _exit (status);
}

--
2.17.1

Mathieu Desnoyers

2018-11-21 18:39:35 UTC

Expose support_record_failure_init () so constructors can explicitly
initialize the record failure API.

This is preferred to lazy initialization at first use, because
lazy initialization does not cover use in constructors within
forked children processes (forked from parent constructor).

Signed-off-by: Mathieu Desnoyers <***@efficios.com>
CC: Carlos O'Donell <***@redhat.com>
CC: Florian Weimer <***@redhat.com>
CC: Joseph Myers <***@codesourcery.com>
CC: Szabolcs Nagy <***@arm.com>
CC: libc-***@sourceware.org
---
support/check.h | 3 +++
support/support_record_failure.c | 18 +++++++++++++-----
2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/support/check.h b/support/check.h
index e6765289f2..25aae52003 100644
--- a/support/check.h
+++ b/support/check.h
@@ -88,6 +88,9 @@ void support_test_verify_exit_impl (int status, const char *file, int line,
does not support reporting failures from a DSO. */
void support_record_failure (void);

+/* Initialize record failure. Calling this is only needed when
+ recording failures from constructors. */
+void support_record_failure_init (void);
/* Static assertion, under a common name for both C++ and C11. */
#ifdef __cplusplus
# define support_static_assert static_assert
diff --git a/support/support_record_failure.c b/support/support_record_failure.c
index 356798f556..947551d2c8 100644
--- a/support/support_record_failure.c
+++ b/support/support_record_failure.c
@@ -32,8 +32,12 @@
zero, the failure of a test can be detected.

The init constructor function below puts *state on a shared
- annonymous mapping, so that failure reports from subprocesses
- propagate to the parent process. */
+ anonymous mapping, so that failure reports from subprocesses
+ propagate to the parent process.
+
+ support_record_failure_init is exposed so it can be called explicitly
+ in case this API needs to be used from a constructor. */
+
struct test_failures
{
unsigned int counter;
@@ -41,10 +45,14 @@ struct test_failures
};
static struct test_failures *state;

-static __attribute__ ((constructor)) void
-init (void)
+__attribute__ ((constructor)) void
+support_record_failure_init (void)
{
- void *ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE,
+ void *ptr;
+
+ if (state != NULL)
+ return;
+ ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (ptr == MAP_FAILED)
{

--
2.17.1

Florian Weimer

2018-11-21 18:47:43 UTC

Post by Mathieu Desnoyers
+/* Initialize record failure. Calling this is only needed when
+ recording failures from constructors. */
+void support_record_failure_init (void);

This looks okay to me.

Thanks,
Florian

Mathieu Desnoyers

2018-11-21 18:39:33 UTC

When available, use the cpu_id field from __rseq_abi on Linux to
implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable.

Benchmarks:

x86-64: Intel E5-2630 ***@2.40GHz, 16-core, hyperthreading

glibc sched_getcpu(): 13.7 ns (baseline)
glibc sched_getcpu() using rseq: 2.5 ns (speedup: 5.5x)
inline load cpuid from __rseq_abi TLS: 0.8 ns (speedup: 17.1x)

Signed-off-by: Mathieu Desnoyers <***@efficios.com>
CC: Carlos O'Donell <***@redhat.com>
CC: Florian Weimer <***@redhat.com>
CC: Joseph Myers <***@codesourcery.com>
CC: Szabolcs Nagy <***@arm.com>
CC: Thomas Gleixner <***@linutronix.de>
CC: Ben Maurer <***@fb.com>
CC: Peter Zijlstra <***@infradead.org>
CC: "Paul E. McKenney" <***@linux.vnet.ibm.com>
CC: Boqun Feng <***@gmail.com>
CC: Will Deacon <***@arm.com>
CC: Dave Watson <***@fb.com>
CC: Paul Turner <***@google.com>
CC: libc-***@sourceware.org
CC: linux-***@vger.kernel.org
CC: linux-***@vger.kernel.org
---
sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
index b69eeda15c..e1a206075c 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -24,8 +24,8 @@
#endif
#include <sysdep-vdso.h>

-int
-sched_getcpu (void)
+static int
+vsyscall_sched_getcpu (void)
{
#ifdef __NR_getcpu
unsigned int cpu;
@@ -37,3 +37,24 @@ sched_getcpu (void)
return -1;
#endif
}
+
+#ifdef __NR_rseq
+#include <linux/rseq.h>
+
+extern __attribute__ ((tls_model ("initial-exec")))
+__thread volatile struct rseq __rseq_abi;
+
+int
+sched_getcpu (void)
+{
+ int cpu_id = __rseq_abi.cpu_id;
+
+ return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
+}
+#else
+int
+sched_getcpu (void)
+{
+ return vsyscall_sched_getcpu ();
+}
+#endif

--
2.17.1

Mathieu Desnoyers

2018-11-21 18:39:36 UTC

These tests validate that rseq is registered from various execution
contexts (main thread, constructor, destructor, other threads, other
threads created from constructor and destructor, forked process
(without exec), pthread_atfork handlers, pthread setspecific
destructors, C++ thread and process destructors, and signal handlers).

See the Linux kernel selftests for extensive rseq stress-tests.

Signed-off-by: Mathieu Desnoyers <***@efficios.com>
CC: Carlos O'Donell <***@redhat.com>
CC: Florian Weimer <***@redhat.com>
CC: Joseph Myers <***@codesourcery.com>
CC: Szabolcs Nagy <***@arm.com>
CC: Thomas Gleixner <***@linutronix.de>
CC: Ben Maurer <***@fb.com>
CC: Peter Zijlstra <***@infradead.org>
CC: "Paul E. McKenney" <***@linux.vnet.ibm.com>
CC: Boqun Feng <***@gmail.com>
CC: Will Deacon <***@arm.com>
CC: Dave Watson <***@fb.com>
CC: Paul Turner <***@google.com>
CC: libc-***@sourceware.org
---
nptl/Makefile | 2 +-
nptl/tst-rseq.c | 346 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 347 insertions(+), 1 deletion(-)
create mode 100644 nptl/tst-rseq.c

diff --git a/nptl/Makefile b/nptl/Makefile
index 3a5dc80c65..797309f7a8 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -323,7 +323,7 @@ tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
tests-internal := tst-rwlock19 tst-rwlock20 \
tst-sem11 tst-sem12 tst-sem13 \
tst-barrier5 tst-signal7 tst-mutex8 tst-mutex8-static \
- tst-mutexpi8 tst-mutexpi8-static
+ tst-mutexpi8 tst-mutexpi8-static tst-rseq

xtests = tst-setuid1 tst-setuid1-static tst-setuid2 \
tst-mutexpp1 tst-mutexpp6 tst-mutexpp10
diff --git a/nptl/tst-rseq.c b/nptl/tst-rseq.c
new file mode 100644
index 0000000000..c55345e44d
--- /dev/null
+++ b/nptl/tst-rseq.c
@@ -0,0 +1,346 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Mathieu Desnoyers <***@efficios.com>, 2018.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* These tests validate that rseq is registered from various execution
+ contexts (main thread, constructor, destructor, other threads, other
+ threads created from constructor and destructor, forked process
+ (without exec), pthread_atfork handlers, pthread setspecific
+ destructors, C++ thread and process destructors, and signal handlers).
+
+ See the Linux kernel selftests for extensive rseq stress-tests. */
+
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <support/check.h>
+
+#if defined (__linux__) && defined (__NR_rseq)
+#define HAS_RSEQ
+#endif
+
+#ifdef HAS_RSEQ
+#include <linux/rseq.h>
+#include <pthread.h>
+#include <syscall.h>
+#include <stdlib.h>
+#include <error.h>
+#include <errno.h>
+#include <string.h>
+#include <stdint.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <signal.h>
+#include <atomic.h>
+
+static pthread_key_t rseq_test_key;
+
+extern __thread volatile struct rseq __rseq_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+static int
+rseq_thread_registered (void)
+{
+ return __rseq_abi.cpu_id >= 0;
+}
+
+static int
+do_rseq_main_test (void)
+{
+ if (raise (SIGUSR1))
+ FAIL_EXIT1 ("error raising signal");
+ return rseq_thread_registered () ? 0 : 1;
+}
+
+static void
+cancel_routine (void *arg)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in cancel routine");
+}
+
+static int cancel_thread_ready;
+
+static void
+test_cancel_thread (void)
+{
+ pthread_cleanup_push (cancel_routine, NULL);
+ atomic_store_release (&cancel_thread_ready, 1);
+ for (;;)
+ usleep (100);
+ pthread_cleanup_pop (0);
+}
+
+static void *
+thread_function (void * arg)
+{
+ int i = (int) (intptr_t) arg;
+
+ if (raise (SIGUSR1))
+ FAIL_EXIT1 ("error raising signal");
+ if (i == 0)
+ test_cancel_thread ();
+ if (pthread_setspecific (rseq_test_key, (void *) 1l))
+ FAIL_EXIT1 ("error in pthread_setspecific");
+ return rseq_thread_registered () ? NULL : (void *) 1l;
+}
+
+static void
+sighandler (int sig)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in signal handler");
+}
+
+static int
+setup_signals (void)
+{
+ struct sigaction sa;
+
+ sigemptyset (&sa.sa_mask);
+ sigaddset (&sa.sa_mask, SIGUSR1);
+ sa.sa_flags = 0;
+ sa.sa_handler = sighandler;
+ if (sigaction (SIGUSR1, &sa, NULL) != 0)
+ {
+ FAIL_RET ("sigaction failure: %s", strerror (errno));
+ }
+ return 0;
+}
+
+#define N 7
+static const int t[N] = { 1, 2, 6, 5, 4, 3, 50 };
+
+static int
+do_rseq_threads_test (int nr_threads)
+{
+ pthread_t th[nr_threads];
+ int i;
+ int result = 0;
+ pthread_attr_t at;
+
+ if (pthread_attr_init (&at) != 0)
+ {
+ FAIL_RET ("attr_init failed");
+ }
+
+ if (pthread_attr_setstacksize (&at, 1 * 1024 * 1024) != 0)
+ {
+ FAIL_RET ("attr_setstacksize failed");
+ }
+
+ cancel_thread_ready = 0;
+ for (i = 0; i < nr_threads; ++i)
+ if (pthread_create (&th[i], NULL, thread_function,
+ (void *) (intptr_t) i) != 0)
+ {
+ FAIL_EXIT1 ("creation of thread %d failed", i);
+ }
+
+ if (pthread_attr_destroy (&at) != 0)
+ {
+ FAIL_RET ("attr_destroy failed");
+ }
+
+ while (!atomic_load_acquire (&cancel_thread_ready))
+ usleep (100);
+
+ if (pthread_cancel (th[0]))
+ FAIL_EXIT1 ("error in pthread_cancel");
+
+ for (i = 0; i < nr_threads; ++i)
+ {
+ void *v;
+ if (pthread_join (th[i], &v) != 0)
+ {
+ printf ("join of thread %d failed\n", i);
+ result = 1;
+ }
+ else if (i != 0 && v != NULL)
+ {
+ printf ("join %d successful, but child failed\n", i);
+ result = 1;
+ }
+ else if (i == 0 && v == NULL)
+ {
+ printf ("join %d successful, child did not fail as expected\n", i);
+ result = 1;
+ }
+ }
+ return result;
+}
+
+static int
+sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len,
+ int flags, uint32_t sig)
+{
+ return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+static int
+rseq_available (void)
+{
+ int rc;
+
+ rc = sys_rseq (NULL, 0, 0, 0);
+ if (rc != -1)
+ FAIL_EXIT1 ("Unexpected rseq return value %d", rc);
+ switch (errno)
+ {
+ case ENOSYS:
+ return 0;
+ case EINVAL:
+ return 1;
+ default:
+ FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno));
+ }
+}
+
+static int
+do_rseq_fork_test (void)
+{
+ int status;
+ pid_t pid, retpid;
+
+ pid = fork ();
+ switch (pid)
+ {
+ case 0:
+ if (do_rseq_main_test ())
+ FAIL_EXIT1 ("rseq not registered in child");
+ exit (0);
+ case -1:
+ FAIL_EXIT1 ("Unexpected fork error %s", strerror (errno));
+ }
+ retpid = TEMP_FAILURE_RETRY (waitpid (pid, &status, 0));
+ if (retpid != pid)
+ {
+ FAIL_EXIT1 ("waitpid returned %ld, expected %ld",
+ (long int) retpid, (long int) pid);
+ }
+ return WEXITSTATUS (status);
+}
+
+static int
+do_rseq_test (void)
+{
+ int i, result = 0;
+
+ if (!rseq_available ())
+ {
+ FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
+ }
+ if (setup_signals ())
+ FAIL_EXIT1 ("error setting up signal handler");
+ if (raise (SIGUSR1))
+ FAIL_EXIT1 ("error raising signal");
+ if (do_rseq_main_test ())
+ result = 1;
+ for (i = 0; i < N; i++)
+ {
+ if (do_rseq_threads_test (t[i]))
+ result = 1;
+ }
+ if (do_rseq_fork_test ())
+ result = 1;
+ return result;
+}
+
+static void
+atfork_prepare (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread atfork prepare");
+}
+
+static void
+atfork_parent (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread atfork parent");
+}
+
+static void
+atfork_child (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread atfork child");
+}
+
+static void
+rseq_key_destructor (void *arg)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread key destructor");
+}
+
+static void
+do_rseq_create_key (void)
+{
+ if (pthread_key_create (&rseq_test_key, rseq_key_destructor))
+ FAIL_EXIT1 ("error in pthread_key_create");
+}
+
+static void
+do_rseq_delete_key (void)
+{
+ if (pthread_key_delete (rseq_test_key))
+ FAIL_EXIT1 ("error in pthread_key_delete");
+}
+
+static void __attribute__ ((constructor))
+do_rseq_constructor_test (void)
+{
+ support_record_failure_init ();
+ do_rseq_create_key ();
+ if (pthread_atfork (atfork_prepare, atfork_parent, atfork_child))
+ FAIL_EXIT1 ("error calling pthread_atfork");
+ if (do_rseq_test ())
+ FAIL_EXIT1 ("rseq not registered within constructor");
+}
+
+static void __attribute__ ((destructor))
+do_rseq_destructor_test (void)
+{
+ if (do_rseq_test ())
+ FAIL_EXIT1 ("rseq not registered within destructor");
+ do_rseq_delete_key ();
+}
+
+/* Test C++ destructor called at thread and process exit. */
+void
+__call_tls_dtors (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in C++ thread/process exit destructor");
+}
+#else
+static int
+do_rseq_test (void)
+{
+ FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test");
+ return 0;
+}
+#endif
+
+static int
+do_test (void)
+{
+ return do_rseq_test ();
+}
+
+#include <support/test-driver.c>

--
2.17.1

Mathieu Desnoyers

2018-11-21 18:50:57 UTC

Post by Mathieu Desnoyers
These tests validate that rseq is registered from various execution
contexts (main thread, constructor, destructor, other threads, other
threads created from constructor and destructor, forked process
(without exec), pthread_atfork handlers, pthread setspecific
destructors, C++ thread and process destructors, and signal handlers).
See the Linux kernel selftests for extensive rseq stress-tests.

In the next round (v5), I'll also add atexit () callback testing,
which is missing here.

Thanks,

Mathieu

Post by Mathieu Desnoyers
---
nptl/Makefile | 2 +-
nptl/tst-rseq.c | 346 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 347 insertions(+), 1 deletion(-)
create mode 100644 nptl/tst-rseq.c
diff --git a/nptl/Makefile b/nptl/Makefile
index 3a5dc80c65..797309f7a8 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -323,7 +323,7 @@ tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
tests-internal := tst-rwlock19 tst-rwlock20 \
tst-sem11 tst-sem12 tst-sem13 \
tst-barrier5 tst-signal7 tst-mutex8 tst-mutex8-static \
- tst-mutexpi8 tst-mutexpi8-static
+ tst-mutexpi8 tst-mutexpi8-static tst-rseq
xtests = tst-setuid1 tst-setuid1-static tst-setuid2 \
tst-mutexpp1 tst-mutexpp6 tst-mutexpp10
diff --git a/nptl/tst-rseq.c b/nptl/tst-rseq.c
new file mode 100644
index 0000000000..c55345e44d
--- /dev/null
+++ b/nptl/tst-rseq.c
@@ -0,0 +1,346 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* These tests validate that rseq is registered from various execution
+ contexts (main thread, constructor, destructor, other threads, other
+ threads created from constructor and destructor, forked process
+ (without exec), pthread_atfork handlers, pthread setspecific
+ destructors, C++ thread and process destructors, and signal handlers).
+
+ See the Linux kernel selftests for extensive rseq stress-tests. */
+
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <support/check.h>
+
+#if defined (__linux__) && defined (__NR_rseq)
+#define HAS_RSEQ
+#endif
+
+#ifdef HAS_RSEQ
+#include <linux/rseq.h>
+#include <pthread.h>
+#include <syscall.h>
+#include <stdlib.h>
+#include <error.h>
+#include <errno.h>
+#include <string.h>
+#include <stdint.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <signal.h>
+#include <atomic.h>
+
+static pthread_key_t rseq_test_key;
+
+extern __thread volatile struct rseq __rseq_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+static int
+rseq_thread_registered (void)
+{
+ return __rseq_abi.cpu_id >= 0;
+}
+
+static int
+do_rseq_main_test (void)
+{
+ if (raise (SIGUSR1))
+ FAIL_EXIT1 ("error raising signal");
+ return rseq_thread_registered () ? 0 : 1;
+}
+
+static void
+cancel_routine (void *arg)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in cancel routine");
+}
+
+static int cancel_thread_ready;
+
+static void
+test_cancel_thread (void)
+{
+ pthread_cleanup_push (cancel_routine, NULL);
+ atomic_store_release (&cancel_thread_ready, 1);
+ for (;;)
+ usleep (100);
+ pthread_cleanup_pop (0);
+}
+
+static void *
+thread_function (void * arg)
+{
+ int i = (int) (intptr_t) arg;
+
+ if (raise (SIGUSR1))
+ FAIL_EXIT1 ("error raising signal");
+ if (i == 0)
+ test_cancel_thread ();
+ if (pthread_setspecific (rseq_test_key, (void *) 1l))
+ FAIL_EXIT1 ("error in pthread_setspecific");
+ return rseq_thread_registered () ? NULL : (void *) 1l;
+}
+
+static void
+sighandler (int sig)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in signal handler");
+}
+
+static int
+setup_signals (void)
+{
+ struct sigaction sa;
+
+ sigemptyset (&sa.sa_mask);
+ sigaddset (&sa.sa_mask, SIGUSR1);
+ sa.sa_flags = 0;
+ sa.sa_handler = sighandler;
+ if (sigaction (SIGUSR1, &sa, NULL) != 0)
+ {
+ FAIL_RET ("sigaction failure: %s", strerror (errno));
+ }
+ return 0;
+}
+
+#define N 7
+static const int t[N] = { 1, 2, 6, 5, 4, 3, 50 };
+
+static int
+do_rseq_threads_test (int nr_threads)
+{
+ pthread_t th[nr_threads];
+ int i;
+ int result = 0;
+ pthread_attr_t at;
+
+ if (pthread_attr_init (&at) != 0)
+ {
+ FAIL_RET ("attr_init failed");
+ }
+
+ if (pthread_attr_setstacksize (&at, 1 * 1024 * 1024) != 0)
+ {
+ FAIL_RET ("attr_setstacksize failed");
+ }
+
+ cancel_thread_ready = 0;
+ for (i = 0; i < nr_threads; ++i)
+ if (pthread_create (&th[i], NULL, thread_function,
+ (void *) (intptr_t) i) != 0)
+ {
+ FAIL_EXIT1 ("creation of thread %d failed", i);
+ }
+
+ if (pthread_attr_destroy (&at) != 0)
+ {
+ FAIL_RET ("attr_destroy failed");
+ }
+
+ while (!atomic_load_acquire (&cancel_thread_ready))
+ usleep (100);
+
+ if (pthread_cancel (th[0]))
+ FAIL_EXIT1 ("error in pthread_cancel");
+
+ for (i = 0; i < nr_threads; ++i)
+ {
+ void *v;
+ if (pthread_join (th[i], &v) != 0)
+ {
+ printf ("join of thread %d failed\n", i);
+ result = 1;
+ }
+ else if (i != 0 && v != NULL)
+ {
+ printf ("join %d successful, but child failed\n", i);
+ result = 1;
+ }
+ else if (i == 0 && v == NULL)
+ {
+ printf ("join %d successful, child did not fail as expected\n", i);
+ result = 1;
+ }
+ }
+ return result;
+}
+
+static int
+sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len,
+ int flags, uint32_t sig)
+{
+ return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+static int
+rseq_available (void)
+{
+ int rc;
+
+ rc = sys_rseq (NULL, 0, 0, 0);
+ if (rc != -1)
+ FAIL_EXIT1 ("Unexpected rseq return value %d", rc);
+ switch (errno)
+ {
+ return 0;
+ return 1;
+ FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno));
+ }
+}
+
+static int
+do_rseq_fork_test (void)
+{
+ int status;
+ pid_t pid, retpid;
+
+ pid = fork ();
+ switch (pid)
+ {
+ if (do_rseq_main_test ())
+ FAIL_EXIT1 ("rseq not registered in child");
+ exit (0);
+ FAIL_EXIT1 ("Unexpected fork error %s", strerror (errno));
+ }
+ retpid = TEMP_FAILURE_RETRY (waitpid (pid, &status, 0));
+ if (retpid != pid)
+ {
+ FAIL_EXIT1 ("waitpid returned %ld, expected %ld",
+ (long int) retpid, (long int) pid);
+ }
+ return WEXITSTATUS (status);
+}
+
+static int
+do_rseq_test (void)
+{
+ int i, result = 0;
+
+ if (!rseq_available ())
+ {
+ FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
+ }
+ if (setup_signals ())
+ FAIL_EXIT1 ("error setting up signal handler");
+ if (raise (SIGUSR1))
+ FAIL_EXIT1 ("error raising signal");
+ if (do_rseq_main_test ())
+ result = 1;
+ for (i = 0; i < N; i++)
+ {
+ if (do_rseq_threads_test (t[i]))
+ result = 1;
+ }
+ if (do_rseq_fork_test ())
+ result = 1;
+ return result;
+}
+
+static void
+atfork_prepare (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread atfork prepare");
+}
+
+static void
+atfork_parent (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread atfork parent");
+}
+
+static void
+atfork_child (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread atfork child");
+}
+
+static void
+rseq_key_destructor (void *arg)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in pthread key destructor");
+}
+
+static void
+do_rseq_create_key (void)
+{
+ if (pthread_key_create (&rseq_test_key, rseq_key_destructor))
+ FAIL_EXIT1 ("error in pthread_key_create");
+}
+
+static void
+do_rseq_delete_key (void)
+{
+ if (pthread_key_delete (rseq_test_key))
+ FAIL_EXIT1 ("error in pthread_key_delete");
+}
+
+static void __attribute__ ((constructor))
+do_rseq_constructor_test (void)
+{
+ support_record_failure_init ();
+ do_rseq_create_key ();
+ if (pthread_atfork (atfork_prepare, atfork_parent, atfork_child))
+ FAIL_EXIT1 ("error calling pthread_atfork");
+ if (do_rseq_test ())
+ FAIL_EXIT1 ("rseq not registered within constructor");
+}
+
+static void __attribute__ ((destructor))
+do_rseq_destructor_test (void)
+{
+ if (do_rseq_test ())
+ FAIL_EXIT1 ("rseq not registered within destructor");
+ do_rseq_delete_key ();
+}
+
+/* Test C++ destructor called at thread and process exit. */
+void
+__call_tls_dtors (void)
+{
+ if (!rseq_thread_registered ())
+ FAIL_EXIT1 ("rseq not registered in C++ thread/process exit destructor");
+}
+#else
+static int
+do_rseq_test (void)
+{
+ FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test");
+ return 0;
+}
+#endif
+
+static int
+do_test (void)
+{
+ return do_rseq_test ();
+}
+
+#include <support/test-driver.c>
--
2.17.1

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-21 18:50:16 UTC

Post by Mathieu Desnoyers
Using "exit ()" from pthread_atfork handlers hangs the process. It is
therefore not a good way to exit when reporting a testing error.
Use _exit () instead, which directly exits the process. However,
considering that the buffered stdout is used to output test failure
messages, we first need to flush stdout before exiting.

This is not correct; we need the atexit handlers for cleaning up
temporary files.

It should be possible to call support_record_failure and rely on the
shared memory segment to register the test failure, so that it is
eventually reflected in the exit state (even if the failure happens in
the subprocess).

Thanks,
Florian

Mathieu Desnoyers

2018-11-21 18:52:05 UTC

Post by Florian Weimer

Post by Mathieu Desnoyers
Using "exit ()" from pthread_atfork handlers hangs the process. It is
therefore not a good way to exit when reporting a testing error.
Use _exit () instead, which directly exits the process. However,
considering that the buffered stdout is used to output test failure
messages, we first need to flush stdout before exiting.

This is not correct; we need the atexit handlers for cleaning up
temporary files.
It should be possible to call support_record_failure and rely on the
shared memory segment to register the test failure, so that it is
eventually reflected in the exit state (even if the failure happens in
the subprocess).

Calling exit() from a pthread_atfork handler unfortunately seems to hang
the process :-/

What would you recommend to deal with that situation ?

Thanks,

Mathieu

Post by Florian Weimer
Thanks,
Florian

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Mathieu Desnoyers

2018-11-21 18:55:11 UTC

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Using "exit ()" from pthread_atfork handlers hangs the process. It is
therefore not a good way to exit when reporting a testing error.
Use _exit () instead, which directly exits the process. However,
considering that the buffered stdout is used to output test failure
messages, we first need to flush stdout before exiting.

This is not correct; we need the atexit handlers for cleaning up
temporary files.
It should be possible to call support_record_failure and rely on the
shared memory segment to register the test failure, so that it is
eventually reflected in the exit state (even if the failure happens in
the subprocess).

Calling exit() from a pthread_atfork handler unfortunately seems to hang
the process :-/
What would you recommend to deal with that situation ?

Or do you mean _not_ exiting from the pthread_atfork handlers, but just
record the error there, and continue normally, then catch the error in
the parent ?

Thanks,

Mathieu

Post by Mathieu Desnoyers
Thanks,
Mathieu

Post by Florian Weimer
Thanks,
Florian

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-21 18:56:16 UTC

Post by Mathieu Desnoyers

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Using "exit ()" from pthread_atfork handlers hangs the process. It is
therefore not a good way to exit when reporting a testing error.
Use _exit () instead, which directly exits the process. However,
considering that the buffered stdout is used to output test failure
messages, we first need to flush stdout before exiting.

This is not correct; we need the atexit handlers for cleaning up
temporary files.
It should be possible to call support_record_failure and rely on the
shared memory segment to register the test failure, so that it is
eventually reflected in the exit state (even if the failure happens in
the subprocess).

Calling exit() from a pthread_atfork handler unfortunately seems to hang
the process :-/
What would you recommend to deal with that situation ?

Or do you mean _not_ exiting from the pthread_atfork handlers, but just
record the error there, and continue normally, then catch the error in
the parent ?

Yes, support_record_failure is for delayed failure reporting.

Thanks,
Florian

Mathieu Desnoyers

2018-11-21 22:23:00 UTC

Post by Florian Weimer

----- On Nov 21, 2018, at 1:52 PM, Mathieu Desnoyers

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Using "exit ()" from pthread_atfork handlers hangs the process. It is
therefore not a good way to exit when reporting a testing error.
Use _exit () instead, which directly exits the process. However,
considering that the buffered stdout is used to output test failure
messages, we first need to flush stdout before exiting.

This is not correct; we need the atexit handlers for cleaning up
temporary files.
It should be possible to call support_record_failure and rely on the
shared memory segment to register the test failure, so that it is
eventually reflected in the exit state (even if the failure happens in
the subprocess).

Calling exit() from a pthread_atfork handler unfortunately seems to hang
the process :-/
What would you recommend to deal with that situation ?

Or do you mean _not_ exiting from the pthread_atfork handlers, but just
record the error there, and continue normally, then catch the error in
the parent ?

Yes, support_record_failure is for delayed failure reporting.

Good. I'll favor returning errors back to do_test () whenever possible.

When it's not, I'll use delayed failure reporting rather that FAIL_EXIT1 ()
when the test needs to report an error and has not way to return it to
do_test ().

I'll keep FAIL_EXIT1 for "setup" errors that are outside of the scope of
what the test is actually testing, and also for execution contexts running
after main exits (atexit handler, destructors, pthread key destroy,
__call_tls_dtors.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-21 18:55:41 UTC

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Using "exit ()" from pthread_atfork handlers hangs the process. It is
therefore not a good way to exit when reporting a testing error.
Use _exit () instead, which directly exits the process. However,
considering that the buffered stdout is used to output test failure
messages, we first need to flush stdout before exiting.

This is not correct; we need the atexit handlers for cleaning up
temporary files.
It should be possible to call support_record_failure and rely on the
shared memory segment to register the test failure, so that it is
eventually reflected in the exit state (even if the failure happens in
the subprocess).

Calling exit() from a pthread_atfork handler unfortunately seems to hang
the process :-/
What would you recommend to deal with that situation ?

Why do you want to terminate the process there? Is this part of the
test?

Thanks,
Florian

Rich Felker

2018-11-22 14:36:03 UTC

Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

Maybe I'm missing something obvious, but "unregister" does not seem to
be a meaningful operation. Can you clarify what it's for?

Rich

Mathieu Desnoyers

2018-11-22 15:04:16 UTC

Post by Rich Felker

Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

Maybe I'm missing something obvious, but "unregister" does not seem to
be a meaningful operation. Can you clarify what it's for?

There are really two ways rseq TLS can end up being unregistered: either
through an explicit call to the rseq "unregister", or when the OS frees the
thread's task struct.

You bring an interesting point here: do we need to explicitly unregister
rseq at thread exit, or can we leave that to the OS ?

The key thing to look for here is whether it's valid to access the
TLS area of the thread from preemption or signal delivery happening
at the very end of START_THREAD_DEFN. If it's OK to access it until
the very end of the thread lifetime, then we could do without an
explicit unregistration. However, if at any given point of the late
thread lifetime we end up in a situation where reading or writing to
that TLS area can cause corruption, then we need to carefully
unregister it before that memory is reclaimed/reused.

What we have below the current location for the __rseq_unregister_current_thread ()
call is as follows. I'm not all that convinced that it's valid to access the TLS
area up until __exit_thread () at the very end, especially after setting
setxid_futex back to 0.

Thoughts ?

/* Unregister rseq TLS from kernel. */
if (has_rseq && __rseq_unregister_current_thread ())
abort();

advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);

/* If the thread is detached free the TCB. */
if (IS_DETACHED (pd))
/* Free the TCB. */
__free_tcb (pd);
else if (__glibc_unlikely (pd->cancelhandling & SETXID_BITMASK))
{
/* Some other thread might call any of the setXid functions and expect
us to reply. In this case wait until we did that. */
do
/* XXX This differs from the typical futex_wait_simple pattern in that
the futex_wait condition (setxid_futex) is different from the
condition used in the surrounding loop (cancelhandling). We need
to check and document why this is correct. */
futex_wait_simple (&pd->setxid_futex, 0, FUTEX_PRIVATE);
while (pd->cancelhandling & SETXID_BITMASK);

/* Reset the value so that the stack can be reused. */
pd->setxid_futex = 0;
}

/* We cannot call '_exit' here. '_exit' will terminate the process.

The 'exit' implementation in the kernel will signal when the
process is really dead since 'clone' got passed the CLONE_CHILD_CLEARTID
flag. The 'tid' field in the TCB will be set to zero.

The exit code is zero since in case all threads exit by calling
'pthread_exit' the exit status must be 0 (zero). */
__exit_thread ();

/* NOTREACHED */

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-22 15:11:45 UTC

Post by Mathieu Desnoyers
Thoughts ?
/* Unregister rseq TLS from kernel. */
if (has_rseq && __rseq_unregister_current_thread ())
abort();
advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);
/* If the thread is detached free the TCB. */
if (IS_DETACHED (pd))
/* Free the TCB. */
__free_tcb (pd);

Considering that we proceed to free the TCB, I really hope that all
signals are blocked at this point. (I have not checked this, though.)

Wouldn't this address your concern about access to the rseq area?

Thanks,
Florian

Rich Felker

2018-11-22 15:17:10 UTC

Post by Florian Weimer

Post by Mathieu Desnoyers
Thoughts ?
/* Unregister rseq TLS from kernel. */
if (has_rseq && __rseq_unregister_current_thread ())
abort();
advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);
/* If the thread is detached free the TCB. */
if (IS_DETACHED (pd))
/* Free the TCB. */
__free_tcb (pd);

Considering that we proceed to free the TCB, I really hope that all
signals are blocked at this point. (I have not checked this, though.)
Wouldn't this address your concern about access to the rseq area?

I'm not familiar with glibc's logic here, but for other reasons, I
don't think freeing it is safe until the kernel task exit futex (set
via clone or set_tid_address) has fired. I would guess __free_tcb just
sets up for it to be reclaimable when this happens rather than
immediately freeing it for reuse.

Rich

Florian Weimer

2018-11-22 15:21:02 UTC

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
Thoughts ?
/* Unregister rseq TLS from kernel. */
if (has_rseq && __rseq_unregister_current_thread ())
abort();
advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);
/* If the thread is detached free the TCB. */
if (IS_DETACHED (pd))
/* Free the TCB. */
__free_tcb (pd);

Considering that we proceed to free the TCB, I really hope that all
signals are blocked at this point. (I have not checked this, though.)
Wouldn't this address your concern about access to the rseq area?

I'm not familiar with glibc's logic here, but for other reasons, I
don't think freeing it is safe until the kernel task exit futex (set
via clone or set_tid_address) has fired. I would guess __free_tcb just
sets up for it to be reclaimable when this happens rather than
immediately freeing it for reuse.

Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Thanks,
Florian

Mathieu Desnoyers

2018-11-22 15:33:19 UTC

Post by Florian Weimer

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
Thoughts ?
/* Unregister rseq TLS from kernel. */
if (has_rseq && __rseq_unregister_current_thread ())
abort();
advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);
/* If the thread is detached free the TCB. */
if (IS_DETACHED (pd))
/* Free the TCB. */
__free_tcb (pd);

Considering that we proceed to free the TCB, I really hope that all
signals are blocked at this point. (I have not checked this, though.)
Wouldn't this address your concern about access to the rseq area?

I'm not familiar with glibc's logic here, but for other reasons, I
don't think freeing it is safe until the kernel task exit futex (set
via clone or set_tid_address) has fired. I would guess __free_tcb just
sets up for it to be reclaimable when this happens rather than
immediately freeing it for reuse.

Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Unfortuntately, disabling signals is not enough.

With rseq registered, the kernel accesses the rseq TLS area when returning to
user-space after _preemption_ of user-space, which can be triggered at any
point by an interrupt or a fault, even if signals are blocked.

So if there are cases where the TLS memory is freed while the thread is still
running, we _need_ to explicitly unregister rseq beforehand.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Rich Felker

2018-11-22 15:44:57 UTC

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
Thoughts ?
/* Unregister rseq TLS from kernel. */
if (has_rseq && __rseq_unregister_current_thread ())
abort();
advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);
/* If the thread is detached free the TCB. */
if (IS_DETACHED (pd))
/* Free the TCB. */
__free_tcb (pd);

Considering that we proceed to free the TCB, I really hope that all
signals are blocked at this point. (I have not checked this, though.)
Wouldn't this address your concern about access to the rseq area?

I'm not familiar with glibc's logic here, but for other reasons, I
don't think freeing it is safe until the kernel task exit futex (set
via clone or set_tid_address) has fired. I would guess __free_tcb just
sets up for it to be reclaimable when this happens rather than
immediately freeing it for reuse.

Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Unfortuntately, disabling signals is not enough.
With rseq registered, the kernel accesses the rseq TLS area when returning to
user-space after _preemption_ of user-space, which can be triggered at any
point by an interrupt or a fault, even if signals are blocked.
So if there are cases where the TLS memory is freed while the thread is still
running, we _need_ to explicitly unregister rseq beforehand.

OK, that makes sense. I was wrongly under the impression that the TLS
memory could not be reused until the task exit futex fired, but in
glibc that's not the case with caller-provided stacks.

I still don't understand the need for a reference count though.

Rich

Szabolcs Nagy

2018-11-22 16:24:56 UTC

Post by Mathieu Desnoyers

Post by Florian Weimer
Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Unfortuntately, disabling signals is not enough.
With rseq registered, the kernel accesses the rseq TLS area when returning to
user-space after _preemption_ of user-space, which can be triggered at any
point by an interrupt or a fault, even if signals are blocked.
So if there are cases where the TLS memory is freed while the thread is still
running, we _need_ to explicitly unregister rseq beforehand.

i think the man page should point this out.

the memory of a registered rseq object must not be freed
before thread exit. (either unregister it or free later)

and ideally also point out that c language thread storage
duration does not provide this guarantee: it may be freed
by the implementation before thread exit (which is currently

Mathieu Desnoyers

2018-11-22 18:35:44 UTC

Post by Szabolcs Nagy

Post by Mathieu Desnoyers

Post by Florian Weimer
Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Unfortuntately, disabling signals is not enough.
With rseq registered, the kernel accesses the rseq TLS area when returning to
user-space after _preemption_ of user-space, which can be triggered at any
point by an interrupt or a fault, even if signals are blocked.
So if there are cases where the TLS memory is freed while the thread is still
running, we _need_ to explicitly unregister rseq beforehand.

i think the man page should point this out.

Yes, I should add this to the proposed rseq(2) man page.

Post by Szabolcs Nagy
the memory of a registered rseq object must not be freed
before thread exit. (either unregister it or free later)
and ideally also point out that c language thread storage
duration does not provide this guarantee: it may be freed
by the implementation before thread exit (which is currently
not observable, but with the rseq syscall it is).

How about the following wording ?

Memory of a registered rseq object must not be freed before the
thread exits. Reclaim of rseq object's memory must only be
done after either an explicit rseq unregistration is performed
or after the thread exit. Keep in mind that the implementation
of the Thread-Local Storage (C language __thread) lifetime does
not guarantee existence of the TLS area up until the thread exits.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Rich Felker

2018-11-22 19:01:43 UTC

Post by Mathieu Desnoyers

Post by Szabolcs Nagy

Post by Mathieu Desnoyers

Post by Florian Weimer
Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Unfortuntately, disabling signals is not enough.
With rseq registered, the kernel accesses the rseq TLS area when returning to
user-space after _preemption_ of user-space, which can be triggered at any
point by an interrupt or a fault, even if signals are blocked.
So if there are cases where the TLS memory is freed while the thread is still
running, we _need_ to explicitly unregister rseq beforehand.

i think the man page should point this out.

Yes, I should add this to the proposed rseq(2) man page.

Post by Szabolcs Nagy
the memory of a registered rseq object must not be freed
before thread exit. (either unregister it or free later)
and ideally also point out that c language thread storage
duration does not provide this guarantee: it may be freed
by the implementation before thread exit (which is currently
not observable, but with the rseq syscall it is).

How about the following wording ?
Memory of a registered rseq object must not be freed before the
thread exits. Reclaim of rseq object's memory must only be
done after either an explicit rseq unregistration is performed
or after the thread exit. Keep in mind that the implementation
of the Thread-Local Storage (C language __thread) lifetime does
not guarantee existence of the TLS area up until the thread exits.

This is all really ugly for application/library code to have to deal
with. Maybe if the man page is considered as documenting the syscall
only, and not something you can use, it's okay, but "until the thread
exits" is not well-defined in the sense you want it here. It's more
like "until the kernel task for the thread exits", and the whole point
is that there is some interval in time between the abstract thread
exit and the kernel task exit that is not observable without rseq but
is observable if the rseq is wrongly left installed.

Rich

Mathieu Desnoyers

2018-11-22 19:36:25 UTC

Post by Rich Felker

Post by Mathieu Desnoyers

Post by Szabolcs Nagy

Post by Mathieu Desnoyers

Post by Florian Weimer
Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Unfortuntately, disabling signals is not enough.
With rseq registered, the kernel accesses the rseq TLS area when returning to
user-space after _preemption_ of user-space, which can be triggered at any
point by an interrupt or a fault, even if signals are blocked.
So if there are cases where the TLS memory is freed while the thread is still
running, we _need_ to explicitly unregister rseq beforehand.

i think the man page should point this out.

Yes, I should add this to the proposed rseq(2) man page.

Post by Szabolcs Nagy
the memory of a registered rseq object must not be freed
before thread exit. (either unregister it or free later)
and ideally also point out that c language thread storage
duration does not provide this guarantee: it may be freed
by the implementation before thread exit (which is currently
not observable, but with the rseq syscall it is).

How about the following wording ?
Memory of a registered rseq object must not be freed before the
thread exits. Reclaim of rseq object's memory must only be
done after either an explicit rseq unregistration is performed
or after the thread exit. Keep in mind that the implementation
of the Thread-Local Storage (C language __thread) lifetime does
not guarantee existence of the TLS area up until the thread exits.

This is all really ugly for application/library code to have to deal
with. Maybe if the man page is considered as documenting the syscall
only, and not something you can use, it's okay,

This is indeed for the rseq(2) manpage targeting the man-pages project,
which documents system calls.

Post by Rich Felker
but "until the thread
exits" is not well-defined in the sense you want it here. It's more
like "until the kernel task for the thread exits", and the whole point
is that there is some interval in time between the abstract thread
exit and the kernel task exit that is not observable without rseq but
is observable if the rseq is wrongly left installed.

It's important to clear a possible misunderstanding here: from the
point where the thread issues the "exit" system call, the kernel won't
touch the registered rseq TLS area anymore.

So the point where the thread exits is actually well defined, even from
a user-space perspective.

The problematic scenario arises when glibc frees the TLS memory
before invoking exit() when the thread terminates. In this kind of
scenario, we need to explicitly invoke rseq unregister before TLS
memory reclaim.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Rich Felker

2018-11-22 15:14:44 UTC

Post by Mathieu Desnoyers

Post by Rich Felker

Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

Maybe I'm missing something obvious, but "unregister" does not seem to
be a meaningful operation. Can you clarify what it's for?

There are really two ways rseq TLS can end up being unregistered: either
through an explicit call to the rseq "unregister", or when the OS frees the
thread's task struct.
You bring an interesting point here: do we need to explicitly unregister
rseq at thread exit, or can we leave that to the OS ?
The key thing to look for here is whether it's valid to access the
TLS area of the thread from preemption or signal delivery happening
at the very end of START_THREAD_DEFN. If it's OK to access it until
the very end of the thread lifetime, then we could do without an
explicit unregistration. However, if at any given point of the late
thread lifetime we end up in a situation where reading or writing to
that TLS area can cause corruption, then we need to carefully
unregister it before that memory is reclaimed/reused.

The thread memory cannot be reused until after kernel task exit,
reported via the set_tid_address futex. Also, assuming signals are
blocked (which is absolutely necessary for other reasons) nothing in
userspace can touch the rseq state after this point anyway.

I was more confused about the need for reference counting, though.
Where would anything be able to observe a state other than "refcnt>0"?
-- in which case tracking it makes no sense. If the goal is to make an
ABI thatsupports environments where libc doesn't have rseq support,
and a third-party library is providing a compatible ABI, it seems all
that would be needed it a boolean thread-local "is_initialized" flag.
There does not seem to be any safe way such a library could be
dynamically unloaded (which would require unregistration in all
threads) and thus no need for a count.

Rich

Mathieu Desnoyers

2018-11-22 15:47:00 UTC

Post by Rich Felker

Post by Mathieu Desnoyers

Post by Rich Felker

Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

Maybe I'm missing something obvious, but "unregister" does not seem to
be a meaningful operation. Can you clarify what it's for?

There are really two ways rseq TLS can end up being unregistered: either
through an explicit call to the rseq "unregister", or when the OS frees the
thread's task struct.
You bring an interesting point here: do we need to explicitly unregister
rseq at thread exit, or can we leave that to the OS ?
The key thing to look for here is whether it's valid to access the
TLS area of the thread from preemption or signal delivery happening
at the very end of START_THREAD_DEFN. If it's OK to access it until
the very end of the thread lifetime, then we could do without an
explicit unregistration. However, if at any given point of the late
thread lifetime we end up in a situation where reading or writing to
that TLS area can cause corruption, then we need to carefully
unregister it before that memory is reclaimed/reused.

The thread memory cannot be reused until after kernel task exit,
reported via the set_tid_address futex. Also, assuming signals are
blocked (which is absolutely necessary for other reasons) nothing in
userspace can touch the rseq state after this point anyway.

As discussed in the other leg of the email thread, disabling signals is
not enough to prevent the kernel to access the rseq TLS area on preemption.

Post by Rich Felker
I was more confused about the need for reference counting, though.
Where would anything be able to observe a state other than "refcnt>0"?
-- in which case tracking it makes no sense. If the goal is to make an
ABI thatsupports environments where libc doesn't have rseq support,
and a third-party library is providing a compatible ABI, it seems all
that would be needed it a boolean thread-local "is_initialized" flag.
There does not seem to be any safe way such a library could be
dynamically unloaded (which would require unregistration in all
threads) and thus no need for a count.

Here is one scenario: we have 2 early adopter libraries using rseq which
are deployed in an environment with an older glibc (which does not
support rseq).

Of course, none of those libraries can be dlclose'd unless they somehow
track all registered threads. But let's focus on how exactly those
libraries can handle lazily registering rseq. They can use pthread_key,
and pthread_setspecific on first use by the thread to setup a destructor
function to be invoked at thread exit. But each early adopter library
is unaware of the other, so if we just use a "is_initialized" flag, the
first destructor to run will unregister rseq while the second library
may still be using it.

The same problem arises if we have an application early adopter which
explicitly deal with rseq, with a library early adopter. The issue is
similar, except that the application will explicitly want to unregister
rseq before exiting the thread, which leaves a race window where rseq
is unregistered, but the library may still need to use it.

The reference counter solves this: only the last rseq user for a thread
performs unregistration.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-22 16:28:42 UTC

Post by Mathieu Desnoyers
Here is one scenario: we have 2 early adopter libraries using rseq which
are deployed in an environment with an older glibc (which does not
support rseq).
Of course, none of those libraries can be dlclose'd unless they somehow
track all registered threads.

Well, you can always make them NODELETE so that dlclose is not an issue.
If the library is small enough, that shouldn't be a problem.

Post by Mathieu Desnoyers
But let's focus on how exactly those libraries can handle lazily
registering rseq. They can use pthread_key, and pthread_setspecific on
first use by the thread to setup a destructor function to be invoked
at thread exit. But each early adopter library is unaware of the
other, so if we just use a "is_initialized" flag, the first destructor
to run will unregister rseq while the second library may still be
using it.

I don't think you need unregistering if the memory is initial-exec TLS
memory. Initial-exec TLS memory is tied directly to the TCB and cannot
be freed while the thread is running, so it should be safe to put the
rseq area there even if glibc knows nothing about it. Then you'll only
need a mechanism to find the address of the actually active rseq area
(which you probably have to store in a TLS variable for performance
reasons). And that part you need whether you have reference counter or
not.

Post by Mathieu Desnoyers
The same problem arises if we have an application early adopter which
explicitly deal with rseq, with a library early adopter. The issue is
similar, except that the application will explicitly want to unregister
rseq before exiting the thread, which leaves a race window where rseq
is unregistered, but the library may still need to use it.
The reference counter solves this: only the last rseq user for a thread
performs unregistration.

If you do explicit unregistration, you will run into issues related to
destructor ordering. You should really find a way to avoid that.

Thanks,
Florian

Mathieu Desnoyers

2018-11-22 16:47:42 UTC

Post by Florian Weimer

Post by Mathieu Desnoyers
Here is one scenario: we have 2 early adopter libraries using rseq which
are deployed in an environment with an older glibc (which does not
support rseq).
Of course, none of those libraries can be dlclose'd unless they somehow
track all registered threads.

Well, you can always make them NODELETE so that dlclose is not an issue.
If the library is small enough, that shouldn't be a problem.

That's indeed what I do with lttng-ust, mainly due to use of pthread_key.

Post by Florian Weimer

Post by Mathieu Desnoyers
But let's focus on how exactly those libraries can handle lazily
registering rseq. They can use pthread_key, and pthread_setspecific on
first use by the thread to setup a destructor function to be invoked
at thread exit. But each early adopter library is unaware of the
other, so if we just use a "is_initialized" flag, the first destructor
to run will unregister rseq while the second library may still be
using it.

I don't think you need unregistering if the memory is initial-exec TLS
memory. Initial-exec TLS memory is tied directly to the TCB and cannot
be freed while the thread is running, so it should be safe to put the
rseq area there even if glibc knows nothing about it.

Is it true for user-supplied stacks as well ?

Post by Florian Weimer
Then you'll only
need a mechanism to find the address of the actually active rseq area
(which you probably have to store in a TLS variable for performance
reasons). And that part you need whether you have reference counter or
not.

I'm not sure I follow your thoughts here. Currently, the __rseq_abi
TLS symbol identifies a structure registered to the kernel. The
"currently active" rseq critical section is identified by the field
"rseq_cs" within the __rseq_abi structure.

So here when you say "actually active rseq area", do you mean the
currently registered struct rseq (__rseq_abi) or the currently running
rseq critical section ? (pointed to by __rseq_abi.rseq_cs)

One issue here is that early adopter libraries cannot always use
the IE model. I tried using it for other TLS variables in lttng-ust, and
it ended up hanging our CI tests when tracing a sample application with
lttng-ust under a Java virtual machine: being dlopen'd in a process that
possibly already exhausts the number of available backup TLS IE entries
seems to have odd effects. This is why I'm worried about using the IE model
within lttng-ust.

So using the IE model for glibc makes sense, because nobody dlopen
glibc AFAIK. But it's not so simple for early adopter libraries which
can be dlopen'd.

Post by Florian Weimer

Post by Mathieu Desnoyers
The same problem arises if we have an application early adopter which
explicitly deal with rseq, with a library early adopter. The issue is
similar, except that the application will explicitly want to unregister
rseq before exiting the thread, which leaves a race window where rseq
is unregistered, but the library may still need to use it.
The reference counter solves this: only the last rseq user for a thread
performs unregistration.

If you do explicit unregistration, you will run into issues related to
destructor ordering. You should really find a way to avoid that.

The per-thread reference counter is a way to avoid issues that arise from
lack of destructor ordering. Is it an acceptable approach for you, or
you have something else in mind ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-22 16:59:44 UTC

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Here is one scenario: we have 2 early adopter libraries using rseq which
are deployed in an environment with an older glibc (which does not
support rseq).
Of course, none of those libraries can be dlclose'd unless they somehow
track all registered threads.

Well, you can always make them NODELETE so that dlclose is not an issue.
If the library is small enough, that shouldn't be a problem.

That's indeed what I do with lttng-ust, mainly due to use of pthread_key.

Post by Florian Weimer

Post by Mathieu Desnoyers
But let's focus on how exactly those libraries can handle lazily
registering rseq. They can use pthread_key, and pthread_setspecific on
first use by the thread to setup a destructor function to be invoked
at thread exit. But each early adopter library is unaware of the
other, so if we just use a "is_initialized" flag, the first destructor
to run will unregister rseq while the second library may still be
using it.

I don't think you need unregistering if the memory is initial-exec TLS
memory. Initial-exec TLS memory is tied directly to the TCB and cannot
be freed while the thread is running, so it should be safe to put the
rseq area there even if glibc knows nothing about it.

Is it true for user-supplied stacks as well ?

I'm not entirely sure because the glibc terminology is confusing, but I
think it places intial-exec TLS into the static TLS area (so that it has
a fixed offset from the TCB). The static TLS area is placed on the
user-supplied stack.

Post by Mathieu Desnoyers

Post by Florian Weimer
Then you'll only need a mechanism to find the address of the actually
active rseq area (which you probably have to store in a TLS variable
for performance reasons). And that part you need whether you have
reference counter or not.

I'm not sure I follow your thoughts here. Currently, the __rseq_abi
TLS symbol identifies a structure registered to the kernel. The
"currently active" rseq critical section is identified by the field
"rseq_cs" within the __rseq_abi structure.
So here when you say "actually active rseq area", do you mean the
currently registered struct rseq (__rseq_abi) or the currently running
rseq critical section ? (pointed to by __rseq_abi.rseq_cs)

__rseq_abi.

Post by Mathieu Desnoyers
One issue here is that early adopter libraries cannot always use
the IE model. I tried using it for other TLS variables in lttng-ust, and
it ended up hanging our CI tests when tracing a sample application with
lttng-ust under a Java virtual machine: being dlopen'd in a process that
possibly already exhausts the number of available backup TLS IE entries
seems to have odd effects. This is why I'm worried about using the IE model
within lttng-ust.

You can work around this by preloading the library. I'm not sure if
this is a compelling reason not to use initial-exec TLS memory.

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
The same problem arises if we have an application early adopter which
explicitly deal with rseq, with a library early adopter. The issue is
similar, except that the application will explicitly want to unregister
rseq before exiting the thread, which leaves a race window where rseq
is unregistered, but the library may still need to use it.
The reference counter solves this: only the last rseq user for a thread
performs unregistration.

If you do explicit unregistration, you will run into issues related to
destructor ordering. You should really find a way to avoid that.

The per-thread reference counter is a way to avoid issues that arise from
lack of destructor ordering. Is it an acceptable approach for you, or
you have something else in mind ?

Only for the involved libraries. It will not help if other TLS
destructors run and use these libraries.

Thanks,
Florian

Rich Felker

2018-11-22 17:10:10 UTC

Post by Florian Weimer

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Here is one scenario: we have 2 early adopter libraries using rseq which
are deployed in an environment with an older glibc (which does not
support rseq).
Of course, none of those libraries can be dlclose'd unless they somehow
track all registered threads.

Well, you can always make them NODELETE so that dlclose is not an issue.
If the library is small enough, that shouldn't be a problem.

That's indeed what I do with lttng-ust, mainly due to use of pthread_key.

Post by Florian Weimer

Post by Mathieu Desnoyers
But let's focus on how exactly those libraries can handle lazily
registering rseq. They can use pthread_key, and pthread_setspecific on
first use by the thread to setup a destructor function to be invoked
at thread exit. But each early adopter library is unaware of the
other, so if we just use a "is_initialized" flag, the first destructor
to run will unregister rseq while the second library may still be
using it.

I don't think you need unregistering if the memory is initial-exec TLS
memory. Initial-exec TLS memory is tied directly to the TCB and cannot
be freed while the thread is running, so it should be safe to put the
rseq area there even if glibc knows nothing about it.

Is it true for user-supplied stacks as well ?

I'm not entirely sure because the glibc terminology is confusing, but I
think it places intial-exec TLS into the static TLS area (so that it has
a fixed offset from the TCB). The static TLS area is placed on the
user-supplied stack.

This is an implementation detail that should not leak to applications,
and I believe it's still considered a bug, in that, with large static
TLS, it could overflow or leave unusably little space left on an
otherwise-plenty-large application-provided stack.

Post by Florian Weimer

Post by Mathieu Desnoyers
One issue here is that early adopter libraries cannot always use
the IE model. I tried using it for other TLS variables in lttng-ust, and
it ended up hanging our CI tests when tracing a sample application with
lttng-ust under a Java virtual machine: being dlopen'd in a process that
possibly already exhausts the number of available backup TLS IE entries
seems to have odd effects. This is why I'm worried about using the IE model
within lttng-ust.

You can work around this by preloading the library. I'm not sure if
this is a compelling reason not to use initial-exec TLS memory.

Use of IE model from a .so file (except possibly libc.so or something
else that inherently needs to be present at program startup for other
reasons) should be a considered a bug and unsupported usage.
Encouraging libraries to perpetuate this behavior is going backwards
on progress that's being made to end it.

Post by Florian Weimer

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
The same problem arises if we have an application early adopter which
explicitly deal with rseq, with a library early adopter. The issue is
similar, except that the application will explicitly want to unregister
rseq before exiting the thread, which leaves a race window where rseq
is unregistered, but the library may still need to use it.
The reference counter solves this: only the last rseq user for a thread
performs unregistration.

If you do explicit unregistration, you will run into issues related to
destructor ordering. You should really find a way to avoid that.

The per-thread reference counter is a way to avoid issues that arise from
lack of destructor ordering. Is it an acceptable approach for you, or
you have something else in mind ?

Only for the involved libraries. It will not help if other TLS
destructors run and use these libraries.

Presumably they should have registered their need for rseq too,
thereby incrementing the reference count. I'm not sure this is a good
idea, but I think I understand it now.

Rich

Florian Weimer

2018-11-23 13:10:14 UTC

Post by Rich Felker

Post by Florian Weimer
I'm not entirely sure because the glibc terminology is confusing, but I
think it places intial-exec TLS into the static TLS area (so that it has
a fixed offset from the TCB). The static TLS area is placed on the
user-supplied stack.

This is an implementation detail that should not leak to applications,
and I believe it's still considered a bug, in that, with large static
TLS, it could overflow or leave unusably little space left on an
otherwise-plenty-large application-provided stack.

Sure, but that does not matter in this context because right now, there
is no fix for this bug, and when we fix it, we can take backwards
compatibility into account.

Any library that ends up using rseq will need to coordinate with the
toolchain. I think that's unavoidable given the kernel interface.

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
One issue here is that early adopter libraries cannot always use
the IE model. I tried using it for other TLS variables in lttng-ust, and
it ended up hanging our CI tests when tracing a sample application with
lttng-ust under a Java virtual machine: being dlopen'd in a process that
possibly already exhausts the number of available backup TLS IE entries
seems to have odd effects. This is why I'm worried about using the IE model
within lttng-ust.

You can work around this by preloading the library. I'm not sure if
this is a compelling reason not to use initial-exec TLS memory.

Use of IE model from a .so file (except possibly libc.so or something
else that inherently needs to be present at program startup for other
reasons) should be a considered a bug and unsupported usage.
Encouraging libraries to perpetuate this behavior is going backwards
on progress that's being made to end it.

Why? Just because glibc's TCB allocation strategy is problematic?
We can fix that, even with dlopen.

If you are only concerned about the interactions with dlopen, then why
do you think initial-exec TLS is the problem, and not dlopen?

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
The per-thread reference counter is a way to avoid issues that arise from
lack of destructor ordering. Is it an acceptable approach for you, or
you have something else in mind ?

Only for the involved libraries. It will not help if other TLS
destructors run and use these libraries.

Presumably they should have registered their need for rseq too,
thereby incrementing the reference count. I'm not sure this is a good
idea, but I think I understand it now.

They may have to increase the reference count from 0 to 1, though, so
they have to re-register the rseq area. This tends to get rather messy.

I still I think implicit destruction of the rseq area is preferable over
this complexity.

Thanks,
Florian

Rich Felker

2018-11-23 14:28:43 UTC

Post by Florian Weimer

Post by Rich Felker

Post by Florian Weimer
I'm not entirely sure because the glibc terminology is confusing, but I
think it places intial-exec TLS into the static TLS area (so that it has
a fixed offset from the TCB). The static TLS area is placed on the
user-supplied stack.

This is an implementation detail that should not leak to applications,
and I believe it's still considered a bug, in that, with large static
TLS, it could overflow or leave unusably little space left on an
otherwise-plenty-large application-provided stack.

Sure, but that does not matter in this context because right now, there
is no fix for this bug, and when we fix it, we can take backwards
compatibility into account.
Any library that ends up using rseq will need to coordinate with the
toolchain. I think that's unavoidable given the kernel interface.

Right. I don't agree with this. What I'm saying is that this behavior
(putting static TLS in the caller-provided stack) should not be
documented as a behavior applications can rely on or accounted as a
solution to the rseq problem, since doing so would preclude fixing the
"application doesn't have as much stack as it requested" bug.

Post by Florian Weimer

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
One issue here is that early adopter libraries cannot always use
the IE model. I tried using it for other TLS variables in lttng-ust, and
it ended up hanging our CI tests when tracing a sample application with
lttng-ust under a Java virtual machine: being dlopen'd in a process that
possibly already exhausts the number of available backup TLS IE entries
seems to have odd effects. This is why I'm worried about using the IE model
within lttng-ust.

You can work around this by preloading the library. I'm not sure if
this is a compelling reason not to use initial-exec TLS memory.

Use of IE model from a .so file (except possibly libc.so or something
else that inherently needs to be present at program startup for other
reasons) should be a considered a bug and unsupported usage.
Encouraging libraries to perpetuate this behavior is going backwards
on progress that's being made to end it.

Why? Just because glibc's TCB allocation strategy is problematic?
We can fix that, even with dlopen.
If you are only concerned about the interactions with dlopen, then why
do you think initial-exec TLS is the problem, and not dlopen?

The initial-exec model, *by design*, only works for TLS objects that
exist at initial execution time. That's why it's called initial-exec.
This is not an implementation flaw/limitation in glibc but
fundamental to the fact that you don't have an unlimited-size (or
practically unlimited) virtual address space range for each thread.
The global-dynamic model is the one that admits dynamic creation of
new TLS objects at runtime (thus the name).

Post by Florian Weimer

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
The per-thread reference counter is a way to avoid issues that arise from
lack of destructor ordering. Is it an acceptable approach for you, or
you have something else in mind ?

Only for the involved libraries. It will not help if other TLS
destructors run and use these libraries.

Presumably they should have registered their need for rseq too,
thereby incrementing the reference count. I'm not sure this is a good
idea, but I think I understand it now.

They may have to increase the reference count from 0 to 1, though, so
they have to re-register the rseq area. This tends to get rather messy.
I still I think implicit destruction of the rseq area is preferable over
this complexity.

Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

Rich

Mathieu Desnoyers

2018-11-23 17:05:20 UTC

----- On Nov 23, 2018, at 9:28 AM, Rich Felker ***@libc.org wrote:
[...]

Post by Rich Felker
Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

First, we need the TLS area to be valid until the exit system call
is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
I'm not entirely sure we can guarantee the IE model if another library
gets its own global-dynamic weak symbol elected at execution time. Would
it be better to switch to a "strong" symbol for the glibc __rseq_abi
rather than weak ?

If we rely on implicit unregistration by the exit system call, then we
need to be really sure that the __rseq_abi TLS area can be accessed
(load and store) from kernel preemption up to the point where exit
is invoked. If we have that guarantee with the IE model, then we should
be fine. This means the memory area with the __rseq_abi sits can only
be re-used after the tid field in the TLB is set to 0 by the exit system
call. Looking at allocatestack.c, it looks like the FREE_P () macro
does exactly that.

With all the above respected, we could rely on implicit rseq unregistration
by thread exit rather than do an explicit unregister. We could still need
to increment the __rseq_refcount upon thread start however, so we can
ensure early adopter libraries won't unregister rseq while glibc is using
it. No need to bring the refcount back to 0 in glibc though.

There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ? Related to this,
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Rich Felker

2018-11-23 17:30:19 UTC

Post by Mathieu Desnoyers
[...]

Post by Rich Felker
Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

^^^^^^^^^^^^^^^

Post by Mathieu Desnoyers
Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

This is not "along that path". The above-quoted text is not about
assuming it's safe to make SYS_exit without unregistering the rseq
object, but rather about glibc being able to perform the
rseq-unregister syscall without caring about reference counts, since
it knows no other code that might depend on rseq can run after it.

Post by Mathieu Desnoyers
First, we need the TLS area to be valid until the exit system call
is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
I'm not entirely sure we can guarantee the IE model if another library
gets its own global-dynamic weak symbol elected at execution time. Would
it be better to switch to a "strong" symbol for the glibc __rseq_abi
rather than weak ?

This doesn't help; still whichever comes first in link order would
override. Either way __rseq_abi would be in static TLS, though,
because any dynamically-loaded library is necessarily loaded after
libc, which is loaded at initial exec time.

Post by Mathieu Desnoyers
There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ? Related to this,
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well. Right now access to dynamic TLS from
signal handlers is unsafe in glibc, but static is safe.

Rich

Florian Weimer

2018-11-23 17:39:04 UTC

Post by Rich Felker

Post by Mathieu Desnoyers
There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ? Related to this,
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well.

C++ makes it undefined:

<http://eel.is/c++draft/support.signal#def:evaluation,signal-safe>

Thanks,
Florian

Rich Felker

2018-11-23 17:44:38 UTC

Post by Florian Weimer

Post by Rich Felker

Post by Mathieu Desnoyers
There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ? Related to this,
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well.

<http://eel.is/c++draft/support.signal#def:evaluation,signal-safe>

C also leaves access to pretty much anything from a signal handler
undefined, but that makes signals basically useless. POSIX
inadvertently defines a lot more than it wanted to by ignoring
indirect ways you can access objects using AS-safe functions to pass
around their addresses; there's an open issue for this:

http://austingroupbugs.net/view.php?id=728

I think it's reasonable to say, based on how fond POSIX is of signals
for realtime stuff, that it should allow some reasonable operations,
but just be more careful about what it allows, and disallowing access
to TLS would preclude the only ways to make signals non-awful for
multithreaded processes.

Rich

Florian Weimer

2018-11-23 18:01:38 UTC

Post by Rich Felker

Post by Florian Weimer

Post by Rich Felker

Post by Mathieu Desnoyers
There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ? Related to this,
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well.

<http://eel.is/c++draft/support.signal#def:evaluation,signal-safe>

C also leaves access to pretty much anything from a signal handler
undefined, but that makes signals basically useless.

Access to atomic variables of thread storage duration is defined,
though.

Thanks,
Florian

Mathieu Desnoyers

2018-11-23 17:52:21 UTC

Post by Rich Felker

Post by Mathieu Desnoyers
[...]

Post by Rich Felker
Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

^^^^^^^^^^^^^^^

Post by Mathieu Desnoyers
Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

This is not "along that path". The above-quoted text is not about
assuming it's safe to make SYS_exit without unregistering the rseq
object, but rather about glibc being able to perform the
rseq-unregister syscall without caring about reference counts, since
it knows no other code that might depend on rseq can run after it.

When saying "along that path", what I mean is: if we go in that direction,
then we should look into going all the way there, and rely on thread
exit to implicitly unregister the TLS area.

Do you see any reason for doing an explicit unregistration at thread
exit rather than simply rely on the exit system call ?

Post by Rich Felker

Post by Mathieu Desnoyers
First, we need the TLS area to be valid until the exit system call
is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
I'm not entirely sure we can guarantee the IE model if another library
gets its own global-dynamic weak symbol elected at execution time. Would
it be better to switch to a "strong" symbol for the glibc __rseq_abi
rather than weak ?

This doesn't help; still whichever comes first in link order would
override. Either way __rseq_abi would be in static TLS, though,
because any dynamically-loaded library is necessarily loaded after
libc, which is loaded at initial exec time.

OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
sure I correctly understand your position.

Something can be technically correct based on the current implementation,
but fragile with respect to future changes. We need to carefully distinguish
between the two when exposing ABIs.

Post by Rich Felker

Post by Mathieu Desnoyers
There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ?

This question is still open.

Post by Rich Felker
Related to this,

Post by Mathieu Desnoyers
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well. Right now access to dynamic TLS from
signal handlers is unsafe in glibc, but static is safe.

Which is a shame for the lttng-ust tracer, which needs global-dynamic
TLS variables so it can be dlopen'd, but aims at allowing tracing from
signal handlers. It looks like due to limitations of global-dynamic
TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
could crash the process if the signal handler nests early at thread start
or late before thread exit. One way out of this would be to ensure signals
are blocked at thread start/exit, but I can't find the code responsible for
doing this within glibc.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Rich Felker

2018-11-23 18:35:58 UTC

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
[...]

Post by Rich Felker
Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

^^^^^^^^^^^^^^^

Post by Mathieu Desnoyers
Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

This is not "along that path". The above-quoted text is not about
assuming it's safe to make SYS_exit without unregistering the rseq
object, but rather about glibc being able to perform the
rseq-unregister syscall without caring about reference counts, since
it knows no other code that might depend on rseq can run after it.

When saying "along that path", what I mean is: if we go in that direction,
then we should look into going all the way there, and rely on thread
exit to implicitly unregister the TLS area.
Do you see any reason for doing an explicit unregistration at thread
exit rather than simply rely on the exit system call ?

Whether this is needed is an implementation detail of glibc that
should be permitted to vary between versions. Unless glibc wants to
promise that it would become a public guarantee, it's not part of the
discussion around the API/ABI. Only part of the discussion around
implementation internals of the glibc rseq stuff.

Of course I may be biased thinking application code should not assume
this since it's not true on musl -- for detached threads, the thread
frees its own stack before exiting (and thus has to unregister
set_tid_address and set_robustlist before exiting).

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
First, we need the TLS area to be valid until the exit system call
is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
I'm not entirely sure we can guarantee the IE model if another library
gets its own global-dynamic weak symbol elected at execution time. Would
it be better to switch to a "strong" symbol for the glibc __rseq_abi
rather than weak ?

This doesn't help; still whichever comes first in link order would
override. Either way __rseq_abi would be in static TLS, though,
because any dynamically-loaded library is necessarily loaded after
libc, which is loaded at initial exec time.

OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
sure I correctly understand your position.

I don't think it matters, and I don't think making it weak is
meaningful or useful (weak in a shared library is largely meaningless)
but maybe I'm missing something here.

Post by Mathieu Desnoyers
Something can be technically correct based on the current implementation,
but fragile with respect to future changes. We need to carefully distinguish
between the two when exposing ABIs.

Yes.

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ?

This question is still open.

I can't find it -- maybe it's not done in glibc. It is in musl, and I
assumed glibc would also do it, because otherwise it's possible to see
some inconsistent states from signal handlers. Maybe these are all
undefined due to AS-unsafety of pthread_exit, but I think you can
construct examples where something could be observably wrong without
breaking any rules.

Post by Mathieu Desnoyers

Post by Rich Felker
Related to this,

Post by Mathieu Desnoyers
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well. Right now access to dynamic TLS from
signal handlers is unsafe in glibc, but static is safe.

Which is a shame for the lttng-ust tracer, which needs global-dynamic
TLS variables so it can be dlopen'd, but aims at allowing tracing from
signal handlers. It looks like due to limitations of global-dynamic
TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
could crash the process if the signal handler nests early at thread start
or late before thread exit. One way out of this would be to ensure signals
are blocked at thread start/exit, but I can't find the code responsible for
doing this within glibc.

Just blocking at start/exit won't solve the problem because
global-dynamic TLS in glibc involves dynamic allocation, which is hard
to make AS-safe and of course can fail, leaving no way to make forward
progress.

Rich

Mathieu Desnoyers

2018-11-23 21:09:07 UTC

Post by Rich Felker

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
[...]

Post by Rich Felker
Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

^^^^^^^^^^^^^^^

Post by Mathieu Desnoyers
Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

This is not "along that path". The above-quoted text is not about
assuming it's safe to make SYS_exit without unregistering the rseq
object, but rather about glibc being able to perform the
rseq-unregister syscall without caring about reference counts, since
it knows no other code that might depend on rseq can run after it.

When saying "along that path", what I mean is: if we go in that direction,
then we should look into going all the way there, and rely on thread
exit to implicitly unregister the TLS area.
Do you see any reason for doing an explicit unregistration at thread
exit rather than simply rely on the exit system call ?

Whether this is needed is an implementation detail of glibc that
should be permitted to vary between versions. Unless glibc wants to
promise that it would become a public guarantee, it's not part of the
discussion around the API/ABI. Only part of the discussion around
implementation internals of the glibc rseq stuff.
Of course I may be biased thinking application code should not assume
this since it's not true on musl -- for detached threads, the thread
frees its own stack before exiting (and thus has to unregister
set_tid_address and set_robustlist before exiting).

OK, so on glibc, the implementation could rely on exit side-effect to
implicitly unregister rseq. On musl, based on the scenario you describe,
the library should unregister rseq explicitly before stack reclaim.

Am I understanding the situation correctly ?

Post by Rich Felker

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
First, we need the TLS area to be valid until the exit system call
is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
I'm not entirely sure we can guarantee the IE model if another library
gets its own global-dynamic weak symbol elected at execution time. Would
it be better to switch to a "strong" symbol for the glibc __rseq_abi
rather than weak ?

This doesn't help; still whichever comes first in link order would
override. Either way __rseq_abi would be in static TLS, though,
because any dynamically-loaded library is necessarily loaded after
libc, which is loaded at initial exec time.

OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
sure I correctly understand your position.

I don't think it matters, and I don't think making it weak is
meaningful or useful (weak in a shared library is largely meaningless)
but maybe I'm missing something here.

Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.

Using "weak" in a C library is something I'm not sure is a characteristic
we want or need, because I doubt we would ever want to load two libc at the
same time in a given process.

The only reason I see for using "weak" for the __rseq_abi symbol in the
libc is if we want to allow early adopter applications to define
__rseq_abi as a strong symbol, which would make some sense.

Post by Rich Felker

Post by Mathieu Desnoyers
Something can be technically correct based on the current implementation,
but fragile with respect to future changes. We need to carefully distinguish
between the two when exposing ABIs.

Yes.

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ?

This question is still open.

I can't find it -- maybe it's not done in glibc. It is in musl, and I
assumed glibc would also do it, because otherwise it's possible to see
some inconsistent states from signal handlers. Maybe these are all
undefined due to AS-unsafety of pthread_exit, but I think you can
construct examples where something could be observably wrong without
breaking any rules.

Good to know for the musl case.

Post by Rich Felker

Post by Mathieu Desnoyers

Post by Rich Felker
Related to this,

Post by Mathieu Desnoyers
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well. Right now access to dynamic TLS from
signal handlers is unsafe in glibc, but static is safe.

Which is a shame for the lttng-ust tracer, which needs global-dynamic
TLS variables so it can be dlopen'd, but aims at allowing tracing from
signal handlers. It looks like due to limitations of global-dynamic
TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
could crash the process if the signal handler nests early at thread start
or late before thread exit. One way out of this would be to ensure signals
are blocked at thread start/exit, but I can't find the code responsible for
doing this within glibc.

Just blocking at start/exit won't solve the problem because
global-dynamic TLS in glibc involves dynamic allocation, which is hard
to make AS-safe and of course can fail, leaving no way to make forward
progress.

How hard would it be to create a async-signal-safe memory pool, which would
be always accessed with signals blocked, so we could fix those corner-cases
for good ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-26 08:28:34 UTC

Post by Mathieu Desnoyers
Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.

This is not how ELF dynamic linking works. If the symbol name is the
same, one definition interposes the others.

You need to ensure that the symbol has the same size everywhere, though.
There are some tricky interactions with symbol versions, too. (The
interposing libraries must not use symbol versioning.)

Thanks,
Florian

Mathieu Desnoyers

2018-11-26 15:51:09 UTC

Post by Florian Weimer

Post by Mathieu Desnoyers
Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.

This is not how ELF dynamic linking works. If the symbol name is the
same, one definition interposes the others.
You need to ensure that the symbol has the same size everywhere, though.
There are some tricky interactions with symbol versions, too. (The
interposing libraries must not use symbol versioning.)

I was under the impression that loading the same strong symbol into an
application multiple times would cause some kind of warning if non-weak. I did
some testing to figure out which case I remembered would cause this.

When compiling with "-fno-common", dynamic and static linking work fine, but
trying to add multiple instances of a given symbol into a single object fails
with:

/tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
/tmp/ccQBJBOo.o:(.bss+0x0): first defined here

Even if the symbol has the same size.

So considering that we don't care about compiling into a single object here,
and only care about static and dynamic linking of libraries, indeed the "weak"
symbol is not useful.

So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-26 16:03:02 UTC

Post by Mathieu Desnoyers
So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Yes, please. (But I'm still not sure we need the reference counter.)

Thanks,
Florian

Rich Felker

2018-11-26 17:10:45 UTC

Post by Florian Weimer

Post by Mathieu Desnoyers
So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Yes, please. (But I'm still not sure we need the reference counter.)

The reference counter is needed for out-of-libc implementations
interacting and using the dtor hack. An in-libc implementation doesn't
need to inspect/honor the reference counter, but it does seem to need
to indicate that it has a reference, if you want it to be compatible
with out-of-libc implementations, so that the out-of-libc one will not
unregister the rseq before libc is done with it.

Alternatively another protocol could be chosen for this purpose, but
if has to be something stable and agreed upon, since things would
break badly if libc and the library providing rseq disagreed.

Rich

Mathieu Desnoyers

2018-11-26 19:22:05 UTC

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Yes, please. (But I'm still not sure we need the reference counter.)

The reference counter is needed for out-of-libc implementations
interacting and using the dtor hack. An in-libc implementation doesn't
need to inspect/honor the reference counter, but it does seem to need
to indicate that it has a reference, if you want it to be compatible
with out-of-libc implementations, so that the out-of-libc one will not
unregister the rseq before libc is done with it.

Let's consider two use-cases here: one (simpler) is use of rseq TLS
from thread context by out-of-libc implementations. The other is use of
rseq TLS from signal handler by out-of-libc implementations.

If we only care about users of rseq from thread context, then libc
could simply set the refcount value to 1 on thread start,
and should not care about the value on thread exit. The libc can
either directly call rseq unregister, or rely on thread calling exit
to implicitly unregister rseq, which depends on its own TLS life-time
guarantees. For instance, if the IE-model TLS is valid up until call
to exit, just calling the exit system call is fine. However, if a libc
has a window at thread exit during which the kernel can preempt the
thread with the IE-model TLS area being already reclaimed, then it
needs to explicitly call rseq unregister before freeing the TLS.

The second use-case is out-of-libc implementations using rseq from
signal handler. This one is trickier. First, pthread_key setspecific
is unfortunately not async-signal-safe. I can't find a good way to
seamlessly integrate rseq into out-of-libc signal handlers while
performing lazy registration without races on thread exit. If we
figure out a way to do this though, we should increment the refcount
at thread start in libc (rather than just set it to 1) in case a
signal handler gets nested immediately over the start of the thread
and registers rseq as well.

It looks like it's not the only issue I have with calling lttng-ust
instrumentation from signal handlers, here is the list I have so
far:

* glibc global-dynamic TLS variables are not async-signal-safe,
and lttng-ust cannot use IE-model TLS because it is meant to be
dlopen'd,
* pthread_setspecific is not async-signal-safe,

There should be ways to eventually solve those issues, but it would
be nice if for now the way rseq is implemented in libc does not add
yet another limitation for signal handlers.

Post by Rich Felker
Alternatively another protocol could be chosen for this purpose, but
if has to be something stable and agreed upon, since things would
break badly if libc and the library providing rseq disagreed.

Absolutely. We need to agree on that protocol before user-space
applications/libraries start using rseq.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Mathieu Desnoyers

2018-12-03 21:30:16 UTC

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Florian Weimer

Post by Mathieu Desnoyers
So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Yes, please. (But I'm still not sure we need the reference counter.)

The reference counter is needed for out-of-libc implementations
interacting and using the dtor hack. An in-libc implementation doesn't
need to inspect/honor the reference counter, but it does seem to need
to indicate that it has a reference, if you want it to be compatible
with out-of-libc implementations, so that the out-of-libc one will not
unregister the rseq before libc is done with it.

Let's consider two use-cases here: one (simpler) is use of rseq TLS
from thread context by out-of-libc implementations. The other is use of
rseq TLS from signal handler by out-of-libc implementations.
If we only care about users of rseq from thread context, then libc
could simply set the refcount value to 1 on thread start,
and should not care about the value on thread exit. The libc can
either directly call rseq unregister, or rely on thread calling exit
to implicitly unregister rseq, which depends on its own TLS life-time
guarantees. For instance, if the IE-model TLS is valid up until call
to exit, just calling the exit system call is fine. However, if a libc
has a window at thread exit during which the kernel can preempt the
thread with the IE-model TLS area being already reclaimed, then it
needs to explicitly call rseq unregister before freeing the TLS.
The second use-case is out-of-libc implementations using rseq from
signal handler. This one is trickier. First, pthread_key setspecific
is unfortunately not async-signal-safe. I can't find a good way to
seamlessly integrate rseq into out-of-libc signal handlers while
performing lazy registration without races on thread exit. If we
figure out a way to do this though, we should increment the refcount
at thread start in libc (rather than just set it to 1) in case a
signal handler gets nested immediately over the start of the thread
and registers rseq as well.
It looks like it's not the only issue I have with calling lttng-ust
instrumentation from signal handlers, here is the list I have so
* glibc global-dynamic TLS variables are not async-signal-safe,
and lttng-ust cannot use IE-model TLS because it is meant to be
dlopen'd,
* pthread_setspecific is not async-signal-safe,
There should be ways to eventually solve those issues, but it would
be nice if for now the way rseq is implemented in libc does not add
yet another limitation for signal handlers.

So, after thinking about a bit further, considering that current glibc
do not offer the async-signal-safe APIs required to proceed to touch
global-dynamic TLS variables from signal handlers nor register pthread key
destructors from signal handlers, I will end up needing glibc improvements
to eventually make lttng-ust instrumentation signal-safe.

This means that my main use-case for supporting out-of-libc rseq registration
from signal handlers does not exist today, and will require new glibc APIs
anyway. Therefore, it would make sense to require use of rseq from signal
handlers to depend on rseq registration by glibc at thread start, and limit
the use-case of out-of-libc rseq registration to those that do not nest
within signal handlers.

If we _never_ even want to allow signal handlers to register rseq, we could
set the __rseq_refcount to 1 at thread start in nptl and nptl init. However,
if we want to eventually allow rseq registration from signal handlers in the
future, we may want to consider keeping the __rseq_refcount relaxed atomic
increment at thread start, as long as it does not represent a too big
performance overhead.

For thread exit, we don't care about the __rseq_refcount value at thread exit
and we can unregister rseq unconditionally.

As long as it is not too costly to increment the __rseq_refcount at thread start,
I would be inclined to keep it as an increment rather than setting it to 1, so
we can have more flexibility with respect to future registration of rseq from
signal handlers, even though it is not possible today.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Mathieu Desnoyers

2018-11-26 16:30:51 UTC

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.

This is not how ELF dynamic linking works. If the symbol name is the
same, one definition interposes the others.
You need to ensure that the symbol has the same size everywhere, though.
There are some tricky interactions with symbol versions, too. (The
interposing libraries must not use symbol versioning.)

I was under the impression that loading the same strong symbol into an
application multiple times would cause some kind of warning if non-weak. I did
some testing to figure out which case I remembered would cause this.
When compiling with "-fno-common", dynamic and static linking work fine, but
trying to add multiple instances of a given symbol into a single object fails
/tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
/tmp/ccQBJBOo.o:(.bss+0x0): first defined here
Even if the symbol has the same size.
So considering that we don't care about compiling into a single object here,
and only care about static and dynamic linking of libraries, indeed the "weak"
symbol is not useful.
So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Actually, looking into ld(1) --warn-common, it looks like "weak" would be cleaner
after all, especially for __rseq_abi which we needs to be initialized to a specific
value, which is therefore not a common symbol.

" --warn-common
Warn when a common symbol is combined with another common symbol or with a symbol definition. Unix
linkers allow this somewhat sloppy practice, but linkers on some other operating systems do not.
This option allows you to find potential problems from combining global symbols. Unfortunately,
some C libraries use this practice, so you may get some warnings about symbols in the libraries as
well as in your programs."

Thoughts ?

Thanks,

Mathieu

Post by Mathieu Desnoyers
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Rich Felker

2018-11-26 17:07:58 UTC

Post by Mathieu Desnoyers

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.

This is not how ELF dynamic linking works. If the symbol name is the
same, one definition interposes the others.
You need to ensure that the symbol has the same size everywhere, though.
There are some tricky interactions with symbol versions, too. (The
interposing libraries must not use symbol versioning.)

I was under the impression that loading the same strong symbol into an
application multiple times would cause some kind of warning if non-weak. I did
some testing to figure out which case I remembered would cause this.
When compiling with "-fno-common", dynamic and static linking work fine, but
trying to add multiple instances of a given symbol into a single object fails
/tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
/tmp/ccQBJBOo.o:(.bss+0x0): first defined here
Even if the symbol has the same size.
So considering that we don't care about compiling into a single object here,
and only care about static and dynamic linking of libraries, indeed the "weak"
symbol is not useful.
So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Actually, looking into ld(1) --warn-common, it looks like "weak" would be cleaner
after all, especially for __rseq_abi which we needs to be initialized to a specific
value, which is therefore not a common symbol.
" --warn-common
Warn when a common symbol is combined with another common symbol or with a symbol definition. Unix
linkers allow this somewhat sloppy practice, but linkers on some other operating systems do not.
This option allows you to find potential problems from combining global symbols. Unfortunately,
some C libraries use this practice, so you may get some warnings about symbols in the libraries as
well as in your programs."
Thoughts ?

AFAIK this has nothing to do with dynamic linking.

Rich

Mathieu Desnoyers

2018-12-05 17:24:40 UTC

Post by Rich Felker

----- On Nov 26, 2018, at 10:51 AM, Mathieu Desnoyers

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.

This is not how ELF dynamic linking works. If the symbol name is the
same, one definition interposes the others.
You need to ensure that the symbol has the same size everywhere, though.
There are some tricky interactions with symbol versions, too. (The
interposing libraries must not use symbol versioning.)

I was under the impression that loading the same strong symbol into an
application multiple times would cause some kind of warning if non-weak. I did
some testing to figure out which case I remembered would cause this.
When compiling with "-fno-common", dynamic and static linking work fine, but
trying to add multiple instances of a given symbol into a single object fails
/tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
/tmp/ccQBJBOo.o:(.bss+0x0): first defined here
Even if the symbol has the same size.
So considering that we don't care about compiling into a single object here,
and only care about static and dynamic linking of libraries, indeed the "weak"
symbol is not useful.
So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Actually, looking into ld(1) --warn-common, it looks like "weak" would be cleaner
after all, especially for __rseq_abi which we needs to be initialized to a specific
value, which is therefore not a common symbol.
" --warn-common
Warn when a common symbol is combined with another common symbol or with a
symbol definition. Unix
linkers allow this somewhat sloppy practice, but linkers on some other operating
systems do not.
This option allows you to find potential problems from combining global symbols.
Unfortunately,
some C libraries use this practice, so you may get some warnings about symbols
in the libraries as
well as in your programs."
Thoughts ?

AFAIK this has nothing to do with dynamic linking.

Reading through the ELF specification, it seems to imply that "weak" only affects the link
editor behavior when combining relocatable objects, not the behavior of the dynamic linker.
Is that what you refer to when you say "weak" has nothing to do with dynamic linking ?

If that interpretation is correct, then indeed I should remove the "weak" from the __rseq_abi
and __rseq_refcount.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Szabolcs Nagy

2018-11-26 11:56:36 UTC

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
[...]

Post by Rich Felker
Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

^^^^^^^^^^^^^^^

Post by Mathieu Desnoyers
Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

This is not "along that path". The above-quoted text is not about
assuming it's safe to make SYS_exit without unregistering the rseq
object, but rather about glibc being able to perform the
rseq-unregister syscall without caring about reference counts, since
it knows no other code that might depend on rseq can run after it.

When saying "along that path", what I mean is: if we go in that direction,
then we should look into going all the way there, and rely on thread
exit to implicitly unregister the TLS area.
Do you see any reason for doing an explicit unregistration at thread
exit rather than simply rely on the exit system call ?

Whether this is needed is an implementation detail of glibc that
should be permitted to vary between versions. Unless glibc wants to
promise that it would become a public guarantee, it's not part of the
discussion around the API/ABI. Only part of the discussion around
implementation internals of the glibc rseq stuff.
Of course I may be biased thinking application code should not assume
this since it's not true on musl -- for detached threads, the thread
frees its own stack before exiting (and thus has to unregister
set_tid_address and set_robustlist before exiting).

OK, so on glibc, the implementation could rely on exit side-effect to
implicitly unregister rseq. On musl, based on the scenario you describe,
the library should unregister rseq explicitly before stack reclaim.
Am I understanding the situation correctly ?

i think the point is that you don't need to know these
details in order to come up with a design that allows
both implementations. (then the libc can change later)

so
- is there a need for public unregister api (does the
user do it or the rseq library implicitly unregisters)?
- is there a need for ref counting (or the rseq lib
unconditionally unregisters at the end of a thread,
the libc can certainly do this)?

Post by Mathieu Desnoyers

Post by Rich Felker

Post by Mathieu Desnoyers
OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
sure I correctly understand your position.

I don't think it matters, and I don't think making it weak is
meaningful or useful (weak in a shared library is largely meaningless)
but maybe I'm missing something here.

Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.
Using "weak" in a C library is something I'm not sure is a characteristic
we want or need, because I doubt we would ever want to load two libc at the
same time in a given process.
The only reason I see for using "weak" for the __rseq_abi symbol in the
libc is if we want to allow early adopter applications to define
__rseq_abi as a strong symbol, which would make some sense.

weak really does not matter in dynamic linking
(unless you set the LD_DYNAMIC_WEAK env var for
backward compat with very old glibc, or if it's
an undefined weak reference)

Post by Mathieu Desnoyers

Post by Rich Felker
Just blocking at start/exit won't solve the problem because
global-dynamic TLS in glibc involves dynamic allocation, which is hard
to make AS-safe and of course can fail, leaving no way to make forward
progress.

How hard would it be to create a async-signal-safe memory pool, which would
be always accessed with signals blocked, so we could fix those corner-cases
for good ?

that is hard.

in musl tls access is as-safe, but it uses a different
approach: it does all allocations at thread creation or
dlopen time.

glibc has further issues because it supports dlclose
with module unloading and then dynamic tls related
internal structures are hard to free (it is valid to
implement dlclose as a noop, which is what musl does.
tls access needs to synchronize with dlopen and dlclose
when accessing internal structures, but you need a
lock-free mechanism if the access has to be as-safe,
and dlclo

Mathieu Desnoyers

2018-11-22 17:29:43 UTC

Post by Florian Weimer

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
Here is one scenario: we have 2 early adopter libraries using rseq which
are deployed in an environment with an older glibc (which does not
support rseq).
Of course, none of those libraries can be dlclose'd unless they somehow
track all registered threads.

Well, you can always make them NODELETE so that dlclose is not an issue.
If the library is small enough, that shouldn't be a problem.

That's indeed what I do with lttng-ust, mainly due to use of pthread_key.

Post by Florian Weimer

Post by Mathieu Desnoyers
But let's focus on how exactly those libraries can handle lazily
registering rseq. They can use pthread_key, and pthread_setspecific on
first use by the thread to setup a destructor function to be invoked
at thread exit. But each early adopter library is unaware of the
other, so if we just use a "is_initialized" flag, the first destructor
to run will unregister rseq while the second library may still be
using it.

I don't think you need unregistering if the memory is initial-exec TLS
memory. Initial-exec TLS memory is tied directly to the TCB and cannot
be freed while the thread is running, so it should be safe to put the
rseq area there even if glibc knows nothing about it.

Is it true for user-supplied stacks as well ?

I'm not entirely sure because the glibc terminology is confusing, but I
think it places intial-exec TLS into the static TLS area (so that it has
a fixed offset from the TCB). The static TLS area is placed on the
user-supplied stack.

You said earlier in the email thread that user-supplied stack can be
reclaimed by __free_tcb () while the thread still runs, am I correct ?
If so, then we really want to unregister the rseq TLS before that.

I notice that __free_tcb () calls __deallocate_stack (), which invokes
_dl_deallocate_tls (). Accessing the TLS from the kernel upon preemption
would appear fragile after this call.

[...]

Post by Florian Weimer

Post by Mathieu Desnoyers
One issue here is that early adopter libraries cannot always use
the IE model. I tried using it for other TLS variables in lttng-ust, and
it ended up hanging our CI tests when tracing a sample application with
lttng-ust under a Java virtual machine: being dlopen'd in a process that
possibly already exhausts the number of available backup TLS IE entries
seems to have odd effects. This is why I'm worried about using the IE model
within lttng-ust.

You can work around this by preloading the library. I'm not sure if
this is a compelling reason not to use initial-exec TLS memory.

LTTng-UST is meant to be used as a dependency for e.g. a java logger,
or a python logger. Those rely on dlopen, and it would be very painful
to ask all users to preload lttng-ust within their environment which is
sometimes already complex. It works today through dlopen, and I consider
this a user-facing behavior which I am very reluctant to break.

Post by Florian Weimer

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers
The same problem arises if we have an application early adopter which
explicitly deal with rseq, with a library early adopter. The issue is
similar, except that the application will explicitly want to unregister
rseq before exiting the thread, which leaves a race window where rseq
is unregistered, but the library may still need to use it.
The reference counter solves this: only the last rseq user for a thread
performs unregistration.

If you do explicit unregistration, you will run into issues related to
destructor ordering. You should really find a way to avoid that.

The per-thread reference counter is a way to avoid issues that arise from
lack of destructor ordering. Is it an acceptable approach for you, or
you have something else in mind ?

Only for the involved libraries. It will not help if other TLS
destructors run and use these libraries.

You bring an interesting point. The reference counter suffice to ensure
that the kernel will not try to reference the TLS area beyond its registration
scope, but it does not guarantee that another destructor (or a signal
handler) won't try to use the rseq TLS area after it has been unregistered.

Unregistration of the TLS before freeing its memory is required for correctness.

However, a use-after-unregistration can be dealt with by other means. This
is one of the reasons why I want to upstream the "cpu_opv" system call into
Linux: this is a fallback mechanism to use when rseq cannot do forward
progress (e.g. debugger single-stepping), or to use in those scenarios
where rseq is not registered (early at thread creation, or late at thread
exit). Moreover, it allows handling use-cases of migration of data between
per-cpu data structures, which is pretty much impossible to do right if we
only have rseq available.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Florian Weimer

2018-11-23 13:29:18 UTC

Post by Mathieu Desnoyers

Post by Florian Weimer

Post by Mathieu Desnoyers

Post by Florian Weimer
I don't think you need unregistering if the memory is initial-exec TLS
memory. Initial-exec TLS memory is tied directly to the TCB and cannot
be freed while the thread is running, so it should be safe to put the
rseq area there even if glibc knows nothing about it.

Is it true for user-supplied stacks as well ?

I'm not entirely sure because the glibc terminology is confusing, but I
think it places intial-exec TLS into the static TLS area (so that it has
a fixed offset from the TCB). The static TLS area is placed on the
user-supplied stack.

You said earlier in the email thread that user-supplied stack can be
reclaimed by __free_tcb () while the thread still runs, am I correct ?
If so, then we really want to unregister the rseq TLS before that.

No, dynamic TLS can be reclaimed. Static TLS (which I assume includes
initial-exec TLS) is not deallocated.

Post by Mathieu Desnoyers
I notice that __free_tcb () calls __deallocate_stack (), which invokes
_dl_deallocate_tls (). Accessing the TLS from the kernel upon preemption
would appear fragile after this call.

_dl_deallocate_tls only covers dynamic TLS.

Thanks,
Florian

50 Replies
14 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Mathieu Desnoyers 2018-11-21 18:39:34 UTC

Mathieu Desnoyers 2018-11-21 18:39:35 UTC

Florian Weimer 2018-11-21 18:47:43 UTC

Mathieu Desnoyers 2018-11-21 18:39:33 UTC

Mathieu Desnoyers 2018-11-21 18:39:36 UTC

Mathieu Desnoyers 2018-11-21 18:50:57 UTC

Florian Weimer 2018-11-21 18:50:16 UTC

Mathieu Desnoyers 2018-11-21 18:52:05 UTC

Mathieu Desnoyers 2018-11-21 18:55:11 UTC

Florian Weimer 2018-11-21 18:56:16 UTC

Mathieu Desnoyers 2018-11-21 22:23:00 UTC

Florian Weimer 2018-11-21 18:55:41 UTC

Rich Felker 2018-11-22 14:36:03 UTC

Mathieu Desnoyers 2018-11-22 15:04:16 UTC

Florian Weimer 2018-11-22 15:11:45 UTC

Rich Felker 2018-11-22 15:17:10 UTC

Florian Weimer 2018-11-22 15:21:02 UTC

Mathieu Desnoyers 2018-11-22 15:33:19 UTC

Rich Felker 2018-11-22 15:44:57 UTC

Szabolcs Nagy 2018-11-22 16:24:56 UTC

Mathieu Desnoyers 2018-11-22 18:35:44 UTC

Rich Felker 2018-11-22 19:01:43 UTC

Mathieu Desnoyers 2018-11-22 19:36:25 UTC

Rich Felker 2018-11-22 15:14:44 UTC

Mathieu Desnoyers 2018-11-22 15:47:00 UTC

Florian Weimer 2018-11-22 16:28:42 UTC

Mathieu Desnoyers 2018-11-22 16:47:42 UTC

Florian Weimer 2018-11-22 16:59:44 UTC

Rich Felker 2018-11-22 17:10:10 UTC

Florian Weimer 2018-11-23 13:10:14 UTC

Rich Felker 2018-11-23 14:28:43 UTC

Mathieu Desnoyers 2018-11-23 17:05:20 UTC

Rich Felker 2018-11-23 17:30:19 UTC

Florian Weimer 2018-11-23 17:39:04 UTC

Rich Felker 2018-11-23 17:44:38 UTC

Florian Weimer 2018-11-23 18:01:38 UTC

Mathieu Desnoyers 2018-11-23 17:52:21 UTC

Rich Felker 2018-11-23 18:35:58 UTC

Mathieu Desnoyers 2018-11-23 21:09:07 UTC

Florian Weimer 2018-11-26 08:28:34 UTC

Mathieu Desnoyers 2018-11-26 15:51:09 UTC

Florian Weimer 2018-11-26 16:03:02 UTC

Rich Felker 2018-11-26 17:10:45 UTC

Mathieu Desnoyers 2018-11-26 19:22:05 UTC

Mathieu Desnoyers 2018-12-03 21:30:16 UTC

Mathieu Desnoyers 2018-11-26 16:30:51 UTC

Rich Felker 2018-11-26 17:07:58 UTC

Mathieu Desnoyers 2018-12-05 17:24:40 UTC

Szabolcs Nagy 2018-11-26 11:56:36 UTC

Mathieu Desnoyers 2018-11-22 17:29:43 UTC

Florian Weimer 2018-11-23 13:29:18 UTC

about - legalese

Loading...