Discussion:
[Gc] test_stack on powerpc (power7)
Will Schmidt
2014-01-28 21:22:35 UTC
Permalink
Hi All,
I've been looking at the test_stack test case failure as seen on
ppc64 / power7 based systems. I don't have a fix, but believe I
understand where the problem is occurring.

The simplest case I've been able to duplicate is with three threads.
As I've added debug to the code, the problem gets harder to nail down
precisely, but this is what seems to be happening.

In the failure scenario:
The list appears OK during run_one_test() before and after
AO_stack_pop() is called. The thread is holding two entries in the t[i]
array, and the list still looks OK. The list is damaged after the
AO_stack_push() call is made.

Within AO_stack_push(),
[src/atomic_ops_stack.c:AO_stack_pop_explicit_aux_require()]
The malfunction seems to be triggered while one of the threads is
between the "first=AO_load(list);" and the
"AO_compare_and_swap_release(list,first,next);". Either one or both of
the other threads will have removed and replaced multiple elements, such
that the compare and swap of list,first,next will pass the check, but
the list entries, particularly the next pointer at first, has changed.

This is referenced in the comment at that location:
/* Thus its next link cannot have changed out from under us, and we */
/* removed exactly one entry and preserved the rest of the list. */
/* Note that it is quite possible that an additional entry was */
/* inserted and removed while we were running; this is OK since the */
/* part of the list following first must have remained unchanged, and */
/* first must again have been at the head of the list when the */
/* compare_and_swap succeeded. */

which seem to be untrue in this case.


The powerpc AO_* functions seem to be OK. We'd prefer the gcc atomic
builtins be used (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html),
(thats what they are there for), but I don't think that change
would help in this case.

My recommendation is that the test be rewritten to handle the case where
first->next has changed underneath the current thread.
A shorter term fix would probably be to disable the test_stack test for
power7 and newer processors, until it can be fixed.

Thanks,
-Will
Lennart Sorensen
2014-01-28 21:43:15 UTC
Permalink
Post by Will Schmidt
Hi All,
I've been looking at the test_stack test case failure as seen on
ppc64 / power7 based systems. I don't have a fix, but believe I
understand where the problem is occurring.
The simplest case I've been able to duplicate is with three threads.
As I've added debug to the code, the problem gets harder to nail down
precisely, but this is what seems to be happening.
The list appears OK during run_one_test() before and after
AO_stack_pop() is called. The thread is holding two entries in the t[i]
array, and the list still looks OK. The list is damaged after the
AO_stack_push() call is made.
Within AO_stack_push(),
[src/atomic_ops_stack.c:AO_stack_pop_explicit_aux_require()]
The malfunction seems to be triggered while one of the threads is
between the "first=AO_load(list);" and the
"AO_compare_and_swap_release(list,first,next);". Either one or both of
the other threads will have removed and replaced multiple elements, such
that the compare and swap of list,first,next will pass the check, but
the list entries, particularly the next pointer at first, has changed.
/* Thus its next link cannot have changed out from under us, and we */
/* removed exactly one entry and preserved the rest of the list. */
/* Note that it is quite possible that an additional entry was */
/* inserted and removed while we were running; this is OK since the */
/* part of the list following first must have remained unchanged, and */
/* first must again have been at the head of the list when the */
/* compare_and_swap succeeded. */
which seem to be untrue in this case.
The powerpc AO_* functions seem to be OK. We'd prefer the gcc atomic
builtins be used (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html),
(thats what they are there for), but I don't think that change
would help in this case.
My recommendation is that the test be rewritten to handle the case where
first->next has changed underneath the current thread.
A shorter term fix would probably be to disable the test_stack test for
power7 and newer processors, until it can be fixed.
So does this mean the stack test is wrong? I was worried that the actual
AO_ functions were wrong on powerpc.
--
Len Sorensen
Will Schmidt
2014-01-28 22:50:04 UTC
Permalink
Post by Lennart Sorensen
Post by Will Schmidt
Hi All,
I've been looking at the test_stack test case failure as seen on
ppc64 / power7 based systems. I don't have a fix, but believe I
understand where the problem is occurring.
The simplest case I've been able to duplicate is with three threads.
As I've added debug to the code, the problem gets harder to nail down
precisely, but this is what seems to be happening.
The list appears OK during run_one_test() before and after
AO_stack_pop() is called. The thread is holding two entries in the t[i]
array, and the list still looks OK. The list is damaged after the
AO_stack_push() call is made.
Within AO_stack_push(),
[src/atomic_ops_stack.c:AO_stack_pop_explicit_aux_require()]
The malfunction seems to be triggered while one of the threads is
between the "first=AO_load(list);" and the
"AO_compare_and_swap_release(list,first,next);". Either one or both of
the other threads will have removed and replaced multiple elements, such
that the compare and swap of list,first,next will pass the check, but
the list entries, particularly the next pointer at first, has changed.
/* Thus its next link cannot have changed out from under us, and we */
/* removed exactly one entry and preserved the rest of the list. */
/* Note that it is quite possible that an additional entry was */
/* inserted and removed while we were running; this is OK since the */
/* part of the list following first must have remained unchanged, and */
/* first must again have been at the head of the list when the */
/* compare_and_swap succeeded. */
which seem to be untrue in this case.
The powerpc AO_* functions seem to be OK. We'd prefer the gcc atomic
builtins be used (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html),
(thats what they are there for), but I don't think that change
would help in this case.
My recommendation is that the test be rewritten to handle the case where
first->next has changed underneath the current thread.
A shorter term fix would probably be to disable the test_stack test for
power7 and newer processors, until it can be fixed.
So does this mean the stack test is wrong? I was worried that the actual
AO_ functions were wrong on powerpc.
Yes, Thats my feeling right now, that the test is wrong. To clarify,
it's obviously not all over wrong, but there is definitely a corner
condition that we're able to hit on power7 and newer that we don't
(yet?) see on other platforms.
Alexander Herz
2014-01-29 12:46:44 UTC
Permalink
Hi,

I want to use a foreign allocator together with boehmgc:

template<T>
class X:public boehmgc::gc_cleanup
{
T* mem;

X()
{
mem=alloc();
}

~X()
{
free(mem);
}
};

where X is allocated via boehm's allocator methods.

Will this just work is is this asking for trouble?
If T contains pointers to objectes allocated via boehm's allocater, will
they be properly collected?

Thx,
Alex
Boehm, Hans
2014-01-29 07:05:43 UTC
Permalink
Interesting.

I suspect a memory ordering bug in atomic_ops_stack.c, or possibly a subtle miscompilation. As the rest of the comment states, the scenario you describe should be impossible, because the first note on the list was added to the blacklist. Thus it could have been removed, but not reinserted (without low-order-bit-twiddling) by another thread. If any of this had happened, the compare_and_swap should have failed.

The algorithm and correctness argument (for a hypothetical sequentially consistent machine) are described in a PODC 2004 paper (see http://www.hpl.hp.com/techreports/2004/HPL-2004-105.pdf). Not that PODC papers are always correct; but they have to be close enough to convince smart referees. And the fact that this only fails on Power7 suggests memory ordering issues rather than an outright algorithmic bug to me.

I'm somewhat concerned whether the store in the AO_compare_and_swap_acquire() in AO_stack_pop_explicit_aux_acquire() is actually ordered before the subsequent loads of first_ptr and next. The acquire is enforced with an lwsync, which normally doesn't order a store followed by a load. Since the store is atomic with a load, it may do the right thing; I'm not sure. Can you try inserting a "sync" and see if that helps?

I would also check that the recheck of first against AO_load(list) appears in the assembly code where it should.

There could easily be yet a different issue that could explain this. I don't immediately see anything that would definitely explain it. But given the subtleties of the Power memory model, and the fact that libatomic_ops' semantics aren't as carefully defined as they should be (or as carefully as the later C++11 primitives), there seem to be plenty of bug opportunities here.

Hans
-----Original Message-----
On Behalf Of Will Schmidt
Sent: Tuesday, January 28, 2014 1:23 PM
Subject: [Gc] test_stack on powerpc (power7)
Hi All,
I've been looking at the test_stack test case failure as seen on
ppc64 / power7 based systems. I don't have a fix, but believe I
understand where the problem is occurring.
The simplest case I've been able to duplicate is with three threads.
As I've added debug to the code, the problem gets harder to nail down
precisely, but this is what seems to be happening.
The list appears OK during run_one_test() before and after
AO_stack_pop() is called. The thread is holding two entries in the t[i] array,
and the list still looks OK. The list is damaged after the
AO_stack_push() call is made.
Within AO_stack_push(),
[src/atomic_ops_stack.c:AO_stack_pop_explicit_aux_require()]
The malfunction seems to be triggered while one of the threads is between
the "first=AO_load(list);" and the
"AO_compare_and_swap_release(list,first,next);". Either one or both of the
other threads will have removed and replaced multiple elements, such that
the compare and swap of list,first,next will pass the check, but the list
entries, particularly the next pointer at first, has changed.
/* Thus its next link cannot have changed out from under us, and we */
/* removed exactly one entry and preserved the rest of the list. */
/* Note that it is quite possible that an additional entry was */
/* inserted and removed while we were running; this is OK since the */
/* part of the list following first must have remained unchanged, and */
/* first must again have been at the head of the list when the */
/* compare_and_swap succeeded. */
which seem to be untrue in this case.
The powerpc AO_* functions seem to be OK. We'd prefer the gcc atomic
builtins be used (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-
Builtins.html),
(thats what they are there for), but I don't think that change would help in
this case.
My recommendation is that the test be rewritten to handle the case where
first->next has changed underneath the current thread.
A shorter term fix would probably be to disable the test_stack test for
power7 and newer processors, until it can be fixed.
Thanks,
-Will
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Lennart Sorensen
2014-01-29 15:00:47 UTC
Permalink
Post by Boehm, Hans
Interesting.
I suspect a memory ordering bug in atomic_ops_stack.c, or possibly a subtle miscompilation. As the rest of the comment states, the scenario you describe should be impossible, because the first note on the list was added to the blacklist. Thus it could have been removed, but not reinserted (without low-order-bit-twiddling) by another thread. If any of this had happened, the compare_and_swap should have failed.
The algorithm and correctness argument (for a hypothetical sequentially consistent machine) are described in a PODC 2004 paper (see http://www.hpl.hp.com/techreports/2004/HPL-2004-105.pdf). Not that PODC papers are always correct; but they have to be close enough to convince smart referees. And the fact that this only fails on Power7 suggests memory ordering issues rather than an outright algorithmic bug to me.
I'm somewhat concerned whether the store in the AO_compare_and_swap_acquire() in AO_stack_pop_explicit_aux_acquire() is actually ordered before the subsequent loads of first_ptr and next. The acquire is enforced with an lwsync, which normally doesn't order a store followed by a load. Since the store is atomic with a load, it may do the right thing; I'm not sure. Can you try inserting a "sync" and see if that helps?
I would also check that the recheck of first against AO_load(list) appears in the assembly code where it should.
I can try to disassemble the binary if I knew which bit to look for.

I tried building with -O0 to disable the compiler from trying to optimize
things. No change.
Post by Boehm, Hans
There could easily be yet a different issue that could explain this. I don't immediately see anything that would definitely explain it. But given the subtleties of the Power memory model, and the fact that libatomic_ops' semantics aren't as carefully defined as they should be (or as carefully as the later C++11 primitives), there seem to be plenty of bug opportunities here.
If you can tell me what to add where I can very quickly test it.
--
len Sorensen
Will Schmidt
2014-01-30 18:56:03 UTC
Permalink
Post by Boehm, Hans
Interesting.
<...snip...>
Post by Boehm, Hans
I would also check that the recheck of first against AO_load(list) appears in the assembly code where it should.
Thanks for the suggestions. :-)

I looked closer at the code that does the x_bits twiddling, and after a
bit of quality time single-stepping within gdb, have found an issue.
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the AO_load_acquire() in
powerpc.h like I had initialy thought, per the #defines I was lookging
at in generalize-small.h. The critical detail in that is the lack of an
isync in the atomic_load.h version.

I'm admittedly unclear of how the path through the header file includes
should be.

The patch below (inline and attached) seems a bit hackish to me, but is
also sufficient to allow test_stack to run to completion on the P7 here.
(test_stack running in a loop, ~ 1000 successful runs so far). For
inclusion as-is, or as inspiration to whomever better understands the
include hierarchy.

Thanks,
-Will

--

Force AO_load() to map to AO_load_acquire() for powerpc. The
AO_load_acquire() function includes isync instructions that
are critical for proper behavior on power system.


Signed-Off-By: Will Schmidt <***@vnet.ibm.com>


diff -aur --exclude='*.Plo' --exclude='*.Po' libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h 2013-11-10 03:57:17.000000000 -0600
+++ libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h 2014-01-30 12:17:20.819984940 -0600
@@ -29,6 +29,8 @@

#include "../all_aligned_atomic_load_store.h"

+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case. */
Lennart Sorensen
2014-01-30 20:09:04 UTC
Permalink
Post by Will Schmidt
Post by Boehm, Hans
Interesting.
<...snip...>
Post by Boehm, Hans
I would also check that the recheck of first against AO_load(list) appears in the assembly code where it should.
Thanks for the suggestions. :-)
I looked closer at the code that does the x_bits twiddling, and after a
bit of quality time single-stepping within gdb, have found an issue.
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the AO_load_acquire() in
powerpc.h like I had initialy thought, per the #defines I was lookging
at in generalize-small.h. The critical detail in that is the lack of an
isync in the atomic_load.h version.
I'm admittedly unclear of how the path through the header file includes
should be.
The patch below (inline and attached) seems a bit hackish to me, but is
also sufficient to allow test_stack to run to completion on the P7 here.
(test_stack running in a loop, ~ 1000 successful runs so far). For
inclusion as-is, or as inspiration to whomever better understands the
include hierarchy.
Thanks,
-Will
--
Force AO_load() to map to AO_load_acquire() for powerpc. The
AO_load_acquire() function includes isync instructions that
are critical for proper behavior on power system.
diff -aur --exclude='*.Plo' --exclude='*.Po' libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h 2013-11-10 03:57:17.000000000 -0600
+++ libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h 2014-01-30 12:17:20.819984940 -0600
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case. */
diff -aur --exclude='*.Plo' --exclude='*.Po' libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h 2013-11-10 03:57:17.000000000 -0600
+++ libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h 2014-01-30 12:17:20.819984940 -0600
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case. */
Works for me on the version in Debian Wheezy
(libatomic-ops-7.2~alpha5+cvs20100601).

Yayyyy!
--
Len Sorensen
Ivan Maidanski
2014-02-01 08:52:00 UTC
Permalink
Hello Will,

Your current patch is just a workaround to make the test pass. It would be good to develop a proper patch.
Post by Will Schmidt
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the AO_load_acquire() in
powerpc.h like I had initialy thought, per the #defines I was looking
at in generalize-small.h. The critical detail in that is the lack of an
isync in the atomic_load.h version.
1. So, you say that aligned accesses are not guaranteed to be atomic, right? (This means that inclusion of all_aligned_atomic_load_store.h is incorrect.)
2. What about stores? What's about char/short/int wide loads?
3. What about other operations? If there is a gcc version produces correct code for atomic builtin, could please check assembly generated by tests/list_atomic.c (I'm referring here to master branch, i.e v7.4.0) of current implementation of primitives with the gcc/generic.h implementation (this could be done by commenting out entire content of gcc/powerpc.h including generic.h instead).

Thank you

Regards,
Ivan
Post by Will Schmidt
Post by Will Schmidt
Post by Boehm, Hans
Interesting.
<...snip...>
Post by Boehm, Hans
I would also check that the recheck of first against AO_load(list) appears in the assembly code where it should.
Thanks for the suggestions. :-)
I looked closer at the code that does the x_bits twiddling, and after a
bit of quality time single-stepping within gdb, have found an issue.
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the AO_load_acquire() in
powerpc.h like I had initialy thought, per the #defines I was lookging
at in generalize-small.h. The critical detail in that is the lack of an
isync in the atomic_load.h version.
I'm admittedly unclear of how the path through the header file includes
should be.
The patch below (inline and attached) seems a bit hackish to me, but is
also sufficient to allow test_stack to run to completion on the P7 here.
(test_stack running in a loop, ~ 1000 successful runs so far). For
inclusion as-is, or as inspiration to whomever better understands the
include hierarchy.
Thanks,
-Will
--
Force AO_load() to map to AO_load_acquire() for powerpc. The
AO_load_acquire() function includes isync instructions that
are critical for proper behavior on power system.
diff -aur --exclude='*.Plo' --exclude='*.Po' libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h 2013-11-10 03:57:17.000000000 -0600
+++ libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h 2014-01-30 12:17:20.819984940 -0600
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case. */
diff -aur --exclude='*.Plo' --exclude='*.Po' libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h 2013-11-10 03:57:17.000000000 -0600
+++ libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h 2014-01-30 12:17:20.819984940 -0600
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case. */
Works for me on the version in Debian Wheezy
(libatomic-ops-7.2~alpha5+cvs20100601).
Yayyyy!
--
Len Sorensen
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
--
ИваМ МайЎаМскОй
Will Schmidt
2014-02-03 20:30:39 UTC
Permalink
Post by Ivan Maidanski
Hello Will,
Hi,
Post by Ivan Maidanski
Your current patch is just a workaround to make the test pass. It
would be good to develop a proper patch.
Yes, no doubt. As I mentioned, I leave that to someone who understands
the intended include hierarchy here better than I do.
Post by Ivan Maidanski
Post by Will Schmidt
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the AO_load_acquire()
in
Post by Will Schmidt
powerpc.h like I had initialy thought, per the #defines I was
looking
Post by Will Schmidt
at in generalize-small.h. The critical detail in that is the lack of
an
Post by Will Schmidt
isync in the atomic_load.h version.
1. So, you say that aligned accesses are not guaranteed to be atomic,
right?
There may be a nit between being atomic versus the accesses having the
same view of storage. that said, and in this context, I'll go with
"correct"
Post by Ivan Maidanski
(This means that inclusion of all_aligned_atomic_load_store.h is
incorrect.)
I'm not certain what the intent is with regards to the header files. At
least some of the compare_and_swap function was coming out of powerpc.h.
This may be an #ifdef/#endif issue somewhere within.
Post by Ivan Maidanski
2. What about stores? What's about char/short/int wide loads?
(I'll paraphrase a bit from the bookII.) "The lwarx/stwcx instructions
together permit atomic update of a storage location." Access for data
types smaller than wordsize would need to round up, so
shifting/masking/inserting may apply.
Post by Ivan Maidanski
3. What about other operations? If there is a gcc version produces
correct code for atomic builtin, could please check assembly generated
by tests/list_atomic.c (I'm referring here to master branch, i.e
v7.4.0) of current implementation of primitives with the gcc/generic.h
implementation (this could be done by commenting out entire content of
gcc/powerpc.h including generic.h instead).
Thank you
Regards,
Ivan
Thu, 30 Jan 2014, 15:09 -05:00 from "Lennart Sorensen"
Post by Will Schmidt
Post by Boehm, Hans
Interesting.
<...snip...>
Post by Boehm, Hans
I would also check that the recheck of first against
AO_load(list) appears in the assembly code where it should.
Post by Will Schmidt
Thanks for the suggestions. :-)
I looked closer at the code that does the x_bits twiddling,
and after a
Post by Will Schmidt
bit of quality time single-stepping within gdb, have found
an issue.
Post by Will Schmidt
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the
AO_load_acquire() in
Post by Will Schmidt
powerpc.h like I had initialy thought, per the #defines I
was lookging
Post by Will Schmidt
at in generalize-small.h. The critical detail in that is the
lack of an
Post by Will Schmidt
isync in the atomic_load.h version.
I'm admittedly unclear of how the path through the header
file includes
Post by Will Schmidt
should be.
The patch below (inline and attached) seems a bit hackish to
me, but is
Post by Will Schmidt
also sufficient to allow test_stack to run to completion on
the P7 here.
Post by Will Schmidt
(test_stack running in a loop, ~ 1000 successful runs so
far). For
Post by Will Schmidt
inclusion as-is, or as inspiration to whomever better
understands the
Post by Will Schmidt
include hierarchy.
Thanks,
-Will
--
Force AO_load() to map to AO_load_acquire() for powerpc. The
AO_load_acquire() function includes isync instructions that
are critical for proper behavior on power system.
diff -aur --exclude='*.Plo' --exclude='*.Po'
libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
Post by Will Schmidt
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
2013-11-10 03:57:17.000000000 -0600
Post by Will Schmidt
+++
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
2014-01-30 12:17:20.819984940 -0600
Post by Will Schmidt
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case.
*/
Post by Will Schmidt
diff -aur --exclude='*.Plo' --exclude='*.Po'
libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
Post by Will Schmidt
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
2013-11-10 03:57:17.000000000 -0600
Post by Will Schmidt
+++
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
2014-01-30 12:17:20.819984940 -0600
Post by Will Schmidt
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case.
*/
Works for me on the version in Debian Wheezy
(libatomic-ops-7.2~alpha5+cvs20100601).
Yayyyy!
Ivan Maidanski
2014-02-16 15:20:13 UTC
Permalink
Hi Will,

I don't think current definition of AO_load is incorrect. (The definition has never been changed and all other supported platforms have the same
no-barrier atomic load definition.)

Probably the barrier should be added to stack implementation.
Is it really correct to have "first = AO_load(list)" (in AO_stack_pop_explicit_aux_acquire) instead of "first = AO_load_acquire(list)" ?
Post by Will Schmidt
Post by Ivan Maidanski
Hello Will,
Hi,
Post by Ivan Maidanski
Your current patch is just a workaround to make the test pass. It
would be good to develop a proper patch.
Yes, no doubt. As I mentioned, I leave that to someone who understands
the intended include hierarchy here better than I do.
Post by Ivan Maidanski
Post by Will Schmidt
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the AO_load_acquire()
in
Post by Will Schmidt
powerpc.h like I had initialy thought, per the #defines I was
looking
Post by Will Schmidt
at in generalize-small.h. The critical detail in that is the lack of
an
Post by Will Schmidt
isync in the atomic_load.h version.
You could also check the GCC implementation of no-barrier atomic load for ppc/ppc64 by producing assembly for the following code:

char char_load(const volatile char *addr) {  return __atomic_load_n(addr, __ATOMIC_RELAXED); }

void char_store(volatile char *addr, char value) {  __atomic_store_n(addr, value, __ATOMIC_RELAXED); }

short short_load(const volatile short *addr) {  return __atomic_load_n(addr, __ATOMIC_RELAXED); }

void short_store(volatile short *addr, short value) {  __atomic_store_n(addr, value, __ATOMIC_RELAXED); }

int int_load(const volatile int *addr) {  return __atomic_load_n(addr, __ATOMIC_RELAXED); }

void int_store(volatile int *addr, int value) {  __atomic_store_n(addr, value, __ATOMIC_RELAXED); }
Post by Will Schmidt
Post by Ivan Maidanski
1. So, you say that aligned accesses are not guaranteed to be atomic,
right?
There may be a nit between being atomic versus the accesses having the
same view of storage. that said, and in this context, I'll go with
"correct"
I don't understand. AO_load does not require any barrier.

Regards,
Ivan
Post by Will Schmidt
Post by Ivan Maidanski
(This means that inclusion of all_aligned_atomic_load_store.h is
incorrect.)
I'm not certain what the intent is with regards to the header files. At
least some of the compare_and_swap function was coming out of powerpc.h.
This may be an #ifdef/#endif issue somewhere within.
Post by Ivan Maidanski
2. What about stores? What's about char/short/int wide loads?
(I'll paraphrase a bit from the bookII.) "The lwarx/stwcx instructions
together permit atomic update of a storage location." Access for data
types smaller than wordsize would need to round up, so
shifting/masking/inserting may apply.
Post by Ivan Maidanski
3. What about other operations? If there is a gcc version produces
correct code for atomic builtin, could please check assembly generated
by tests/list_atomic.c (I'm referring here to master branch, i.e
v7.4.0) of current implementation of primitives with the gcc/generic.h
implementation (this could be done by commenting out entire content of
gcc/powerpc.h including generic.h instead).
Thank you
Regards,
Ivan
Thu, 30 Jan 2014, 15:09 -05:00 from "Lennart Sorensen"
Post by Will Schmidt
Post by Boehm, Hans
Interesting.
<...snip...>
Post by Boehm, Hans
I would also check that the recheck of first against
AO_load(list) appears in the assembly code where it should.
Post by Will Schmidt
Thanks for the suggestions. :-)
I looked closer at the code that does the x_bits twiddling,
and after a
Post by Will Schmidt
bit of quality time single-stepping within gdb, have found
an issue.
Post by Will Schmidt
The AO_load() function is being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, rather than the
AO_load_acquire() in
Post by Will Schmidt
powerpc.h like I had initialy thought, per the #defines I
was lookging
Post by Will Schmidt
at in generalize-small.h. The critical detail in that is the
lack of an
Post by Will Schmidt
isync in the atomic_load.h version.
I'm admittedly unclear of how the path through the header
file includes
Post by Will Schmidt
should be.
The patch below (inline and attached) seems a bit hackish to
me, but is
Post by Will Schmidt
also sufficient to allow test_stack to run to completion on
the P7 here.
Post by Will Schmidt
(test_stack running in a loop, ~ 1000 successful runs so
far). For
Post by Will Schmidt
inclusion as-is, or as inspiration to whomever better
understands the
Post by Will Schmidt
include hierarchy.
Thanks,
-Will
--
Force AO_load() to map to AO_load_acquire() for powerpc. The
AO_load_acquire() function includes isync instructions that
are critical for proper behavior on power system.
diff -aur --exclude='*.Plo' --exclude='*.Po'
libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
Post by Will Schmidt
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
2013-11-10 03:57:17.000000000 -0600
Post by Will Schmidt
+++
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
2014-01-30 12:17:20.819984940 -0600
Post by Will Schmidt
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case.
*/
Post by Will Schmidt
diff -aur --exclude='*.Plo' --exclude='*.Po'
libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
Post by Will Schmidt
--- libatomic_ops-7.4.0/src/atomic_ops/sysdeps/gcc/powerpc.h
2013-11-10 03:57:17.000000000 -0600
Post by Will Schmidt
+++
libatomic_ops-7.4.0.new/src/atomic_ops/sysdeps/gcc/powerpc.h
2014-01-30 12:17:20.819984940 -0600
Post by Will Schmidt
@@ -29,6 +29,8 @@
#include "../all_aligned_atomic_load_store.h"
+#define AO_load(addr) AO_load_acquire(addr)
+
#include "../test_and_set_t_is_ao_t.h"
/* There seems to be no byte equivalent of lwarx, so this */
/* may really be what we want, at least in the 32-bit case.
*/
Works for me on the version in Debian Wheezy
(libatomic-ops-7.2~alpha5+cvs20100601).
Yayyyy!
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Pavel Raiskup
2016-07-27 14:40:08 UTC
Permalink
Post by Ivan Maidanski
Hi Will,
I don't think current definition of AO_load is incorrect. (The definition
has never been changed and all other supported platforms have the same
no-barrier atomic load definition.)
Probably the barrier should be added to stack implementation.
Is it really correct to have "first = AO_load(list)" (in AO_stack_pop_explicit_aux_acquire) instead of "first = AO_load_acquire(list)" ?
I've seen Fedora maintainer applied this workaround downstream. My question is
whether it is better to have it applied, or rather not.

Note that there were some attempts to make this work on ppc64 before:
http://comments.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/5470

That thread is however scattered across several mailing list archives -> I've
posted one patch for which I would love to hear ideas:
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-December/005507.html

Pavel
Ivan Maidanski
2016-07-27 18:26:14 UTC
Permalink
Hello Pavel and Hans,
Post by Pavel Raiskup
I've seen Fedora maintainer applied this workaround downstream.
Please give a reference? (sorry I failed to access the patch in ML)

To Hans:
What do you think of the failure reason? Could you please better description of stack_aux?
Post by Pavel Raiskup
...
I need to understand more deeply the "stack_aux" structure and how you are using it, I still can't see whether there is guaranteed that the background algorithm may not have collisions (considering 2+ threads trying to make a pop and 2+ threads trying to make a push operation in parallel).
Regards,
Ivan
Post by Pavel Raiskup
Post by Ivan Maidanski
Hi Will,
I don't think current definition of AO_load is incorrect. (The definition
has never been changed and all other supported platforms have the same
no-barrier atomic load definition.)
Probably the barrier should be added to stack implementation.
Is it really correct to have "first = AO_load(list)" (in AO_stack_pop_explicit_aux_acquire) instead of "first = AO_load_acquire(list)" ?
I've seen Fedora maintainer applied this workaround downstream. My question is
whether it is better to have it applied, or rather not.
http://comments.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/5470
That thread is however scattered across several mailing list archives -> I've
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-December/005507.html
Pavel
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
Rex Dieter
2016-07-27 19:17:13 UTC
Permalink
Post by Ivan Maidanski
Hello Pavel and Hans,
Post by Pavel Raiskup
I've seen Fedora maintainer applied this workaround downstream.
Please give a reference? (sorry I failed to access the patch in ML)
This?

http://pkgs.fedoraproject.org/cgit/rpms/libatomic_ops.git/tree/gc_ppc64le_force_AO_load.patch

-- Rex
Pavel Raiskup
2016-07-28 07:39:31 UTC
Permalink
Post by Ivan Maidanski
Hello Pavel and Hans,
Post by Pavel Raiskup
I've seen Fedora maintainer applied this workaround downstream.
Please give a reference? (sorry I failed to access the patch in ML)
Rex Dieter posted correct link -> that probably originally comes from this
thread, but link to archives is here:
https://lists.opendylan.org/pipermail/bdwgc/2014-January/005841.html
Post by Ivan Maidanski
What do you think of the failure reason? Could you please better description of stack_aux?
Post by Pavel Raiskup
...
I need to understand more deeply the "stack_aux" structure and how you are
using it, I still can't see whether there is guaranteed that the background
algorithm may not have collisions (considering 2+ threads trying to make a
pop and 2+ threads trying to make a push operation in parallel).
I was unable to download the patch from archives, so I'm re-attaching the
"attempt" to move this forward again.

Pavel
Ivan Maidanski
2016-08-03 20:16:36 UTC
Permalink
Hello Pavel,
I've seen Fedora maintainer applied this workaround downstream. My question is whether it is better to have it applied, or rather not.
No. I checked gcc (4.8 ppc64le) __atomic_load_n - it is the same as current AO_load implementation. This means even if we apply this hack and later switch to gcc atomics then the issue comes again.

But I'm OK with more local change - replace one or several AO_load operations with AO_load_acquire in atomic_ops_stack.c for power7 - this is, of course, still a hack as we don't know the root cause of the failure.

1. The attached patch should work but please check it.
2. Is it possible to make the patch with a smaller number of changes?
I need to understand more deeply the "stack_aux" structure and how you are using it, I still can't see whether there is guaranteed that the background algorithm may not have collisions (considering 2+ threads trying to make a pop and 2+ threads trying to make a push operation in parallel).
I want Hans to look at it deeper. (It seems he's busy at this moment.)

PS. Link to the issue on Github: https://github.com/ivmai/libatomic_ops/issues/15

Regards,
Ivan
Hello Pavel and Hans,
Post by Pavel Raiskup
I've seen Fedora maintainer applied this workaround downstream.
Please give a reference? (sorry I failed to access the patch in ML)
What do you think of the failure reason? Could you please better description of stack_aux?
Post by Pavel Raiskup
...
I need to understand more deeply the "stack_aux" structure and how you are using it, I still can't see whether there is guaranteed that the background algorithm may not have collisions (considering 2+ threads trying to make a pop and 2+ threads trying to make a push operation in parallel).
Regards,
Ivan
Post by Pavel Raiskup
Post by Ivan Maidanski
Hi Will,
I don't think current definition of AO_load is incorrect. (The definition
has never been changed and all other supported platforms have the same
no-barrier atomic load definition.)
Probably the barrier should be added to stack implementation.
Is it really correct to have "first = AO_load(list)" (in AO_stack_pop_explicit_aux_acquire) instead of "first = AO_load_acquire(list)" ?
I've seen Fedora maintainer applied this workaround downstream. My question is
whether it is better to have it applied, or rather not.
http://comments.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/5470
That thread is however scattered across several mailing list archives -> I've
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-December/005507.html
Pavel
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
Hans Boehm
2016-08-13 20:55:16 UTC
Permalink
I finally had a chance to look at this. Sorry about the llloooonnnnngggggg
delay.

This is really a combination of imprecision in the atomic_ops definitions,
my lack of understanding of PowerPC semantics at the time, and the
resulting sloppiness in the stack implementation.

I think that the core issue is that AO_stack_pop_explicit_aux_acquire
really needs to ensure that the store to the black list
via AO_compare_and_swap_acquire becomes visible before the load to check
the list head on line 155. This effectively needs store->load ordering.
Currently the only ordering here is imposed by the _acquire on the
compare_and_swap. On PowerPC that turns into an lwsync, which is too weak
to enforce store to load ordering.

Given our current definitions, it's probably best to drop the _acquire from
the CAS on line 136, and add a AO_nop_full just before line 154. This is
suboptimal on x86, and we may want to make the fence conditional on "not
x86", where the CAS already includes sufficient ordering. (With C++11
atomics, this would also be tricky and probably involve making a bunch of
accesses seq_cst.)

It would be good if someone with PowerPC access could confirm that this
fixes the problem.

Hans
Post by Ivan Maidanski
Hello Pavel,
I've seen Fedora maintainer applied this workaround downstream. My
question is whether it is better to have it applied, or rather not.
No. I checked gcc (4.8 ppc64le) __atomic_load_n - it is the same as
current AO_load implementation. This means even if we apply this hack and
later switch to gcc atomics then the issue comes again.
But I'm OK with more local change - replace one or several AO_load
operations with AO_load_acquire in atomic_ops_stack.c for power7 - this is,
of course, still a hack as we don't know the root cause of the failure.
1. The attached patch should work but please check it.
2. Is it possible to make the patch with a smaller number of changes?
I need to understand more deeply the "stack_aux" structure and how you
are using it, I still can't see whether there is guaranteed that the
background algorithm may not have collisions (considering 2+ threads trying
to make a pop and 2+ threads trying to make a push operation in parallel).
I want Hans to look at it deeper. (It seems he's busy at this moment.)
PS. Link to the issue on Github: https://github.com/ivmai/
libatomic_ops/issues/15
Regards,
Ivan
Hello Pavel and Hans,
I've seen Fedora maintainer applied this workaround downstream.
Please give a reference? (sorry I failed to access the patch in ML)
What do you think of the failure reason? Could you please better description of stack_aux?
...
I need to understand more deeply the "stack_aux" structure and how you
are using it, I still can't see whether there is guaranteed that the
background algorithm may not have collisions (considering 2+ threads trying
to make a pop and 2+ threads trying to make a push operation in parallel).
Regards,
Ivan
Hi Will,
I don't think current definition of AO_load is incorrect. (The definition
has never been changed and all other supported platforms have the same
no-barrier atomic load definition.)
Probably the barrier should be added to stack implementation.
Is it really correct to have "first = AO_load(list)" (in
AO_stack_pop_explicit_aux_acquire) instead of "first =
AO_load_acquire(list)" ?
I've seen Fedora maintainer applied this workaround downstream. My
question is
whether it is better to have it applied, or rather not.
http://comments.gmane.org/gmane.comp.programming.
garbage-collection.boehmgc/5470
That thread is however scattered across several mailing list archives -> I've
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-
December/005507.html
Pavel
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
Ivan Maidanski
2016-08-15 16:07:40 UTC
Permalink
Thank you, Hans.
Here's the commit to try: https://github.com/ivmai/libatomic_ops/commit/85dd735949d712fa5db4bde0f0fc74f15a624222
Regards,
Ivan
--
Sat, 13 Aug 2016, 23:55 +03:00 from Hans Boehm <***@acm.org>:
I finally had a chance to look at this.  Sorry about the llloooonnnnngggggg delay.
This is really a combination of imprecision in the atomic_ops definitions, my lack of understanding of PowerPC semantics at the time, and the resulting sloppiness in the stack implementation.
I think that the core issue is that AO_stack_pop_explicit_aux_acquire really needs to ensure that the store to the black list via AO_compare_and_swap_acquire becomes visible before the load to check the list head on line 155.  This effectively needs store->load ordering. Currently the only ordering here is imposed by the _acquire on the compare_and_swap. On PowerPC that turns into an lwsync, which is too weak to enforce store to load ordering.
Given our current definitions, it's probably best to drop the _acquire from the CAS on line 136, and add a AO_nop_full just before line 154.  This is suboptimal on x86, and we may want to make the fence conditional on "not x86", where the CAS already includes sufficient ordering.  (With C++11 atomics, this would also be tricky and probably involve making a bunch of accesses seq_cst.)
It would be good if someone with PowerPC access could confirm that this fixes the problem.
Hans
Post by Ivan Maidanski
Hello Pavel,
I've seen Fedora maintainer applied this workaround downstream. My question is whether it is better to have it applied, or rather not.
No. I checked gcc (4.8 ppc64le) __atomic_load_n - it is the same as current AO_load implementation. This means even if we apply this hack and later switch to gcc atomics then the issue comes again.
But I'm OK with more local change - replace one or several AO_load operations with AO_load_acquire in atomic_ops_stack.c for power7 - this is, of course, still a hack as we don't know the root cause of the failure.
1. The attached patch should work but please check it.
2. Is it possible to make the patch with a smaller number of changes?
I need to understand more deeply the "stack_aux" structure and how you are using it, I still can't see whether there is guaranteed that the background algorithm may not have collisions (considering 2+ threads trying to make a pop and 2+ threads trying to make a push operation in parallel).
I want Hans to look at it deeper. (It seems he's busy at this moment.)
PS. Link to the issue on Github: https://github.com/ivmai/libatomic_ops/issues/15
Regards,
Ivan
Wed, 27 Jul 2016, 21:26 +03:00 from Ivan Maidanski <***@mail.ru>:
Hello Pavel and Hans,
Post by Ivan Maidanski
Post by Pavel Raiskup
I've seen Fedora maintainer applied this workaround downstream.
Please give a reference? (sorry I failed to access the patch in ML)
What do you think of the failure reason? Could you please better description of stack_aux?
Post by Pavel Raiskup
...
I need to understand more deeply the "stack_aux" structure and how you are using it, I still can't see whether there is guaranteed that the background algorithm may not have collisions (considering 2+ threads trying to make a pop and 2+ threads trying to make a push operation in parallel).
Regards,
Ivan
  Hi Will,
I don't think current definition of AO_load is incorrect. (The definition
has never been changed and all other supported platforms have the same
no-barrier atomic load definition.)
Probably the barrier should be added to stack implementation.
Is it really correct to have "first = AO_load(list)" (in AO_stack_pop_explicit_aux_acquire) instead of "first = AO_load_acquire(list)" ?
I've seen Fedora maintainer applied this workaround downstream.  My question is
whether it is better to have it applied, or rather not.
Post by Ivan Maidanski
Post by Pavel Raiskup
http://comments.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/5470
That thread is however scattered across several mailing list archives -> I've
Post by Ivan Maidanski
Post by Pavel Raiskup
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-December/005507.html
Pavel
Will Schmidt
2016-08-16 14:58:24 UTC
Permalink
Post by Ivan Maidanski
Thank you, Hans.
https://github.com/ivmai/libatomic_ops/commit/85dd735949d712fa5db4bde0f0fc74f15a624222
Sorry I've not been able to apply much help,.. different priorities for
me now.. :-)


I re-tested with libatomic_ops HEAD this morning, which is
at 90108880a324be588ae61ec12d844f4f7db1968e
and includes 85dd735949d712fa5db4bde0f0fc74f15a624222

This still fails on power. (power7 and power8).

Can you/someone verify what the AO_load() function should be mapping to?
Per a comment I made in that early email (Jan 2014), I had found that
the AO_load() function was (and still is) being mapped to AO_load() in
sysdeps/loadstore/atomic_load.h, versus the AO_load_acquire() in
powerpc.h. ... which led me to wonder if there is a logic issue in the
#if defined chains in generalize-small.h

Thanks,
-Will
Post by Ivan Maidanski
Regards,
Ivan
--
I finally had a chance to look at this. Sorry about the
llloooonnnnngggggg delay.
This is really a combination of imprecision in the atomic_ops
definitions, my lack of understanding of PowerPC semantics at the
time, and the resulting sloppiness in the stack implementation.
I think that the core issue is that AO_stack_pop_explicit_aux_acquire
really needs to ensure that the store to the black list
via AO_compare_and_swap_acquire becomes visible before the load to
check the list head on line 155. This effectively needs store->load
ordering. Currently the only ordering here is imposed by the _acquire
on the compare_and_swap. On PowerPC that turns into an lwsync, which
is too weak to enforce store to load ordering.
Given our current definitions, it's probably best to drop the _acquire
from the CAS on line 136, and add a AO_nop_full just before line 154.
This is suboptimal on x86, and we may want to make the fence
conditional on "not x86", where the CAS already includes sufficient
ordering. (With C++11 atomics, this would also be tricky and probably
involve making a bunch of accesses seq_cst.)
It would be good if someone with PowerPC access could confirm that this fixes the problem.
Hans
Post by Ivan Maidanski
Hello Pavel,
I've seen Fedora maintainer applied this workaround downstream. My
question is whether it is better to have it applied, or rather not.
Post by Ivan Maidanski
No. I checked gcc (4.8 ppc64le) __atomic_load_n - it is the same as
current AO_load implementation. This means even if we apply this hack
and later switch to gcc atomics then the issue comes again.
Post by Ivan Maidanski
But I'm OK with more local change - replace one or several AO_load
operations with AO_load_acquire in atomic_ops_stack.c for power7 -
this is, of course, still a hack as we don't know the root cause of
the failure.
Post by Ivan Maidanski
1. The attached patch should work but please check it.
2. Is it possible to make the patch with a smaller number of changes?
I need to understand more deeply the "stack_aux" structure and how
you are using it, I still can't see whether there is guaranteed that
the background algorithm may not have collisions (considering 2+
threads trying to make a pop and 2+ threads trying to make a push
operation in parallel).
Post by Ivan Maidanski
I want Hans to look at it deeper. (It seems he's busy at this
moment.)
https://github.com/ivmai/libatomic_ops/issues/15
Post by Ivan Maidanski
Regards,
Ivan
Hello Pavel and Hans,
Post by Ivan Maidanski
Post by Pavel Raiskup
I've seen Fedora maintainer applied this workaround downstream.
Please give a reference? (sorry I failed to access the patch in ML)
What do you think of the failure reason? Could you please better
description of stack_aux?
Post by Ivan Maidanski
Post by Pavel Raiskup
...
I need to understand more deeply the "stack_aux" structure and how
you are using it, I still can't see whether there is guaranteed that
the background algorithm may not have collisions (considering 2+
threads trying to make a pop and 2+ threads trying to make a push
operation in parallel).
Post by Ivan Maidanski
Regards,
Ivan
Wed, 27 Jul 2016, 17:40 +03:00 from Pavel Raiskup
Post by Ivan Maidanski
Hi Will,
I don't think current definition of AO_load is incorrect. (The
definition
Post by Ivan Maidanski
has never been changed and all other supported platforms have the
same
Post by Ivan Maidanski
no-barrier atomic load definition.)
Probably the barrier should be added to stack implementation.
Is it really correct to have "first = AO_load(list)" (in
AO_stack_pop_explicit_aux_acquire) instead of "first =
AO_load_acquire(list)" ?
I've seen Fedora maintainer applied this workaround downstream. My question is
whether it is better to have it applied, or rather not.
Post by Ivan Maidanski
Post by Pavel Raiskup
http://comments.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/5470
That thread is however scattered across several mailing list archives -> I've
Post by Ivan Maidanski
Post by Pavel Raiskup
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-December/005507.html
Pavel
_______________________________________________
bdwgc mailing list
Post by Ivan Maidanski
Post by Pavel Raiskup
https://lists.opendylan.org/mailman/listinfo/bdwgc
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
Ivan Maidanski
2016-08-05 21:11:05 UTC
Permalink
Hi Will,

Thank you for the testing and reply.
Hello Rex and Will,
While Pavel is not accessible (till Aug 15), if possible, could you
please check the proposed diff?
Hi Ivan,
I applied this diff against what I believe is the latest,
patching file src/atomic_ops/sysdeps/ibmc/powerpc.h
Hunk #1 succeeded at 126 (offset -55 lines).
patching file src/atomic_ops_stack.c
The diff is against master. (But it is OK to apply to ver 7.4.4.)
The test fails with this patch applied. Sorry, the patch was incorrect (it assumed ibmc not gcc) - the right one if attached now (the patch is again a draft for testing). I'm 100% sure it will workaround the issue (because it is exactly the same as the proposed older workaround (AO_load=AO_load_acquire) after preprocessing).
The malfunction seems to be triggered while one of the threads is between the "first=AO_load(list);" and the "AO_compare_and_swap_release(list,first,next);"
The attached patch contains redundant changes - the question is which ones? Could you please try the modified patch (e.g. remove all changes except for AO_load(list))?
FAILED ( ~line 190 in test_stack.c)
Found duplicate list element 5 ( ~ line 131 in test_stack.c)
where '5' actually varies in value between 1 through 6.
This readily fails on anything power7 and newer. I'm testing on both
power7 and power8(LE and BE). Higher rate of failures on larger
systems, an underpowered power7 lpar with four processor threads still
gets ~ %20 failure rate, a larger box closer to %100. Running the test
under taskset and binding it to a single cpu eliminates the
errors.. :-)
But I'm OK with more local change - replace one or several AO_load
operations with AO_load_acquire in atomic_ops_stack.c for power7 -
this is, of course, still a hack as we don't know the root cause
of the failure.
I sadly don't directly recall the details from when I last poked at
this. The behavior feels like this is due to missing a barrier, which
isn't as big an issue for x86 sort of systems, but required for weakly
consistent memory model systems.
Looking back at the archives, looks like I had narrowed things down to
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2014-January/005825.html
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2014-January/005840.html
The powerpc AO_* functions seem to be OK. We'd prefer the gcc atomic builtins be used ( http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html ), (that's what they are there for), but I don't think that change would help in this case.
Agree. Now you could test the ao_stack based on builtins (but it should fail too) - git clone https://github.com/ivmai/libatomic_ops.git -b ppc_gcc_atomics && cd libatomic_ops && ./autogen.sh && ./configure && make check

Regards,
Ivan
Thanks
-Will
Regards,
Ivan
-------- Forwarded email --------
Date: СреЎа, 3 августа 2016, 23:16 +03:00
Subj: Re[5]: [Gc] test_stack on powerpc (power7)
Hello Pavel,
I've seen Fedora maintainer applied this workaround downstream. My
question is whether it is better to have it applied, or rather not.
No. I checked gcc (4.8 ppc64le) __atomic_load_n - it is the same as
current AO_load implementation. This means even if we apply this hack
and later switch to gcc atomics then the issue comes again.
But I'm OK with more local change - replace one or several AO_load
operations with AO_load_acquire in atomic_ops_stack.c for power7 -
this is, of course, still a hack as we don't know the root cause of
the failure.
1. The attached patch should work but please check it.
2. Is it possible to make the patch with a smaller number of changes?
I need to understand more deeply the "stack_aux" structure and how
you are using it, I still can't see whether there is guaranteed that
the background algorithm may not have collisions (considering 2+
threads trying to make a pop and 2+ threads trying to make a push
operation in parallel).
I want Hans to look at it deeper. (It seems he's busy at this moment.)
https://github.com/ivmai/libatomic_ops/issues/15
Regards,
Ivan
Wed, 27 Jul 2016, 21:26 +03:00 from Ivan Maidanski
Hello Pavel and Hans,
I've seen Fedora maintainer applied this workaround
downstream.
Please give a reference? (sorry I failed to access the patch
in ML)
What do you think of the failure reason? Could you please
better description of stack_aux?
...
I need to understand more deeply the "stack_aux" structure
and how you are using it, I still can't see whether there is
guaranteed that the background algorithm may not have
collisions (considering 2+ threads trying to make a pop and 2+
threads trying to make a push operation in parallel).
Regards,
Ivan
Wed, 27 Jul 2016, 17:40 +03:00 from Pavel Raiskup
On Sunday, February 16, 2014 7:20:13 PM CEST Ivan
Hi Will,
I don't think current definition of AO_load is
incorrect. (The definition
has never been changed and all other supported
platforms have the same
no-barrier atomic load definition.)
Probably the barrier should be added to stack
implementation.
Is it really correct to have "first =
AO_load(list)" (in AO_stack_pop_explicit_aux_acquire)
instead of "first = AO_load_acquire(list)" ?
I've seen Fedora maintainer applied this workaround
downstream. My question is
whether it is better to have it applied, or rather
not.
Note that there were some attempts to make this work
http://comments.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/5470
That thread is however scattered across several
mailing list archives -> I've
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-December/005507.html
Pavel
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
Loading...