Discussion:
RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks
Robbin Ehn
2018-10-26 14:33:18 UTC
Permalink
Hi, please review.

When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.

But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread to
finish the handshake, the JavaThread in the exit path can never reach the
handshake cancellation point. VM thread cannot finishes the handshake and the
third thread is stuck waiting on the VM thread.

To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.

Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.

Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/

Thanks, Robbin
Erik Österlund
2018-10-26 14:37:34 UTC
Permalink
Hi Robbin,

Looks good.

Thanks,
/Erik
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread
is present and tries to execute a VM operation, hence waiting on VM
thread to finish the handshake, the JavaThread in the exit path can
never reach the handshake cancellation point. VM thread cannot
finishes the handshake and the third thread is stuck waiting on the VM
thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
Thanks, Robbin
Robbin Ehn
2018-10-28 19:58:19 UTC
Permalink
Thanks Erik!

/Robbin
Post by Erik Österlund
Hi Robbin,
Looks good.
Thanks,
/Erik
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread to
finish the handshake, the JavaThread in the exit path can never reach the
handshake cancellation point. VM thread cannot finishes the handshake and the
third thread is stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
Thanks, Robbin
Daniel D. Daugherty
2018-10-26 15:38:51 UTC
Permalink
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread
is present and tries to execute a VM operation, hence waiting on VM
thread to finish the handshake, the JavaThread in the exit path can
never reach the handshake cancellation point. VM thread cannot
finishes the handshake and the third thread is stuck waiting on the VM
thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
    No comments.

src/hotspot/share/runtime/handshake.cpp
    L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
    L359:   assert(Thread::current()->is_VM_thread(), "should call from
vm thread");
        Both calls to handshake_process_by_vmthread() which calls this
        function are made with the Threads_lock held:

        MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);

        Looks like the lock is grabbed because of
        possibly_vmthread_can_process_handshake() which asserts:

        L351:   // An externally suspended thread cannot be resumed
while the
        L352:   // Threads_lock is held so it is safe.
        L353:   // Note that this method is allowed to produce false
positives.
        L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
        L355:   if (target->is_ext_suspended()) {
        L356:     return true;
        L357:   }

        Also looks like vmthread_can_process_handshake() needs the
        Threads_lock for the same externally suspended thread check.

        So I was going to ask that you add:

        assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");

        after L359, but how about a comment instead:

        // Threads_lock must be held here, but that is assert()ed in
        // possibly_vmthread_can_process_handshake().


src/hotspot/share/runtime/thread.hpp
    No comments.

src/hotspot/share/runtime/thread.cpp
    No comments.

src/hotspot/share/runtime/threadSMR.cpp
    No comments.

test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
    Very nice test! It specifically exercises ThreadLocalHandshakes
    with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
    only ran into this bug by accident (JDK-8212152) so I like the
    targeted test.

    L49:         while(!exit_now) {
        nit - please add a space before '('

    L51:             for (int i = 0; i < _threads.length; i+=2) {
    L58:             for (int i = 0; i < _threads.length; i+=2) {
        nit - please added spaces around '+='

        So why every other thread? A comment would be good...

    L52:                 wb.handshakeWalkStack(null, true);
        I'm guessing the 'null' parameter means current thread, but
        that's a guess on my part. A comment would be good.

    L82:         for (int i = 0; i < _threads.length; i++) {
    L83:             _threads[i].join();
    L84:         }
        Thanks for cleaning up the test_threads. That will make
        the JTREG thread sweeper happy. However, you don't save
        the test_exit_thread references and you don't clean those
        up either. Yes, I realize that they are supposed to exit,
        but if something hangs up on exit, I'd rather have a join()
        hang failure in this test's code than have the JTREG thread
        sweeper catch it.

Dan
Post by Robbin Ehn
Thanks,
Robbin Ehn
2018-10-28 20:08:36 UTC
Permalink
Hi Dan,

Thanks for looking at this, here is the update:
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/

/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread to
finish the handshake, the JavaThread in the exit path can never reach the
handshake cancellation point. VM thread cannot finishes the handshake and the
third thread is stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
    No comments.
src/hotspot/share/runtime/handshake.cpp
    L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
    L359:   assert(Thread::current()->is_VM_thread(), "should call from vm
thread");
        Both calls to handshake_process_by_vmthread() which calls this
        MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
        Looks like the lock is grabbed because of
        L351:   // An externally suspended thread cannot be resumed while the
        L352:   // Threads_lock is held so it is safe.
        L353:   // Note that this method is allowed to produce false positives.
        L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
        L355:   if (target->is_ext_suspended()) {
        L356:     return true;
        L357:   }
        Also looks like vmthread_can_process_handshake() needs the
        Threads_lock for the same externally suspended thread check.
        assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
        // Threads_lock must be held here, but that is assert()ed in
        // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
    No comments.
src/hotspot/share/runtime/thread.cpp
    No comments.
src/hotspot/share/runtime/threadSMR.cpp
    No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
    Very nice test! It specifically exercises ThreadLocalHandshakes
    with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
    only ran into this bug by accident (JDK-8212152) so I like the
    targeted test.
    L49:         while(!exit_now) {
        nit - please add a space before '('
    L51:             for (int i = 0; i < _threads.length; i+=2) {
    L58:             for (int i = 0; i < _threads.length; i+=2) {
        nit - please added spaces around '+='
        So why every other thread? A comment would be good...
    L52:                 wb.handshakeWalkStack(null, true);
        I'm guessing the 'null' parameter means current thread, but
        that's a guess on my part. A comment would be good.
    L82:         for (int i = 0; i < _threads.length; i++) {
    L83:             _threads[i].join();
    L84:         }
        Thanks for cleaning up the test_threads. That will make
        the JTREG thread sweeper happy. However, you don't save
        the test_exit_thread references and you don't clean those
        up either. Yes, I realize that they are supposed to exit,
        but if something hangs up on exit, I'd rather have a join()
        hang failure in this test's code than have the JTREG thread
        sweeper catch it.
Dan
David Holmes
2018-10-29 06:20:32 UTC
Permalink
Hi Robbin,
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
I can't say I really understand the change in protocol here and why all
the cancel operations are no longer needed. I see the handshake VM
operations reusing the initial "threads list" but I'm unclear how they
might be affected if additional threads are added to the system before
the Threads_lock is acquired?

A couple of specific comments:

src/hotspot/share/runtime/handshake.hpp

cancel_inner() is dead now.

---

src/hotspot/share/runtime/handshake.cpp

This was an odd looking for loop before your change and now looks even
more strange:

for ( ; JavaThread *thr = jtiwh.next(); ) {

can it not simply be a more normal looking:

for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {

?

---

Thanks,
David
Post by Robbin Ehn
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different
ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting
JavaThread is present and tries to execute a VM operation, hence
waiting on VM thread to finish the handshake, the JavaThread in the
exit path can never reach the handshake cancellation point. VM thread
cannot finishes the handshake and the third thread is stuck waiting
on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots
a terminated thread it will count that thread is already done by only
clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call
from vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed
while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false
positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Robbin Ehn
2018-10-29 09:05:07 UTC
Permalink
Hi David,
Post by Erik Österlund
Hi Robbin,
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
I can't say I really understand the change in protocol here and why all the
cancel operations are no longer needed. I see the handshake VM operations
reusing the initial "threads list" but I'm unclear how they might be affected if
additional threads are added to the system before the Threads_lock is acquired?
The ThreadsList is a snapshot of all the JavaThreads at that time in the VM.
Handshake all threads only handshake those JavaThreads. We do not care about new
threads.

The typical generic use-case is the similar to RCU. You first update a global
state and emit the handshake when the handshake return no thread can see the old
state.

GlobalFuncPtr = some_new_func;
HandshakeAllThreads;
------------------------------
No thread can be executing the old func.

If the JavaThreads have a local copy of GlobalFuncPtr the handshake operation
would be to update the local copy to some_new_func.

It works for both Java and for VM resources that respect safepoints.
For a pure VM resource it's much cheaper to use the GlobalCounter.

The Threads_lock must only be held for S/R protocol.
With changes to the S/R protocol, such as using handshake instead, we can remove
Threads_lock for handshakes completely. (with a other small fixes)

The cancel is no longer needed since the terminated threads are visible to the
VM thread when we keep the arming threadslist. We add terminated threads as safe
for handshake. But if we handshake a terminated thread we do not execute the
handshake operation, instead just clear the operation and increment the
completed counter. (the VM thread cancels the operation)

I hope that helped?
Post by Erik Österlund
src/hotspot/share/runtime/handshake.hpp
cancel_inner() is dead now.
---
src/hotspot/share/runtime/handshake.cpp
This was an odd looking for loop before your change and now looks even more
 for ( ; JavaThread *thr = jtiwh.next(); ) {
 for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {
?
Thanks, fixed with below patch.

/Robbin

diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.cpp
--- a/src/hotspot/share/runtime/handshake.cpp Sun Oct 28 20:57:24 2018 +0100
+++ b/src/hotspot/share/runtime/handshake.cpp Mon Oct 29 09:32:26 2018 +0100
@@ -166,1 +166,1 @@
- for ( ; JavaThread *thr = jtiwh.next(); ) {
+ for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {
@@ -198,1 +198,1 @@
- for ( ; JavaThread *thr = jtiwh.next(); ) {
+ for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {
diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.hpp
--- a/src/hotspot/share/runtime/handshake.hpp Sun Oct 28 20:57:24 2018 +0100
+++ b/src/hotspot/share/runtime/handshake.hpp Mon Oct 29 09:32:26 2018 +0100
@@ -63,1 +62,0 @@
- void cancel_inner(JavaThread* thread);
Post by Erik Österlund
---
Thanks,
David
Post by Robbin Ehn
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread to
finish the handshake, the JavaThread in the exit path can never reach the
handshake cancellation point. VM thread cannot finishes the handshake and
the third thread is stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call from vm
thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Post by Robbin Ehn
Thanks, Robbin
David Holmes
2018-10-30 01:31:51 UTC
Permalink
Thanks for the explanation Robbin.

The inline patch also seems fine. I hope the other reviewers noticed it.

David
Post by Robbin Ehn
Hi David,
Post by Erik Österlund
Hi Robbin,
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
I can't say I really understand the change in protocol here and why
all the cancel operations are no longer needed. I see the handshake VM
operations reusing the initial "threads list" but I'm unclear how they
might be affected if additional threads are added to the system before
the Threads_lock is acquired?
The ThreadsList is a snapshot of all the JavaThreads at that time in the VM.
Handshake all threads only handshake those JavaThreads. We do not care about new
threads.
The typical generic use-case is the similar to RCU. You first update a global
state and emit the handshake when the handshake return no thread can see the old
state.
GlobalFuncPtr = some_new_func;
HandshakeAllThreads;
------------------------------
No thread can be executing the old func.
If the JavaThreads have a local copy of GlobalFuncPtr the handshake
operation would be to update the local copy to some_new_func.
It works for both Java and for VM resources that respect safepoints.
For a pure VM resource it's much cheaper to use the GlobalCounter.
The Threads_lock must only be held for S/R protocol.
With changes to the S/R protocol, such as using handshake instead, we can remove
Threads_lock for handshakes completely. (with a other small fixes)
The cancel is no longer needed since the terminated threads are visible to the
VM thread when we keep the arming threadslist. We add terminated threads as safe
for handshake. But if we handshake a terminated thread we do not execute the
handshake operation, instead just clear the operation and increment the
completed counter. (the VM thread cancels the operation)
I hope that helped?
Post by Erik Österlund
src/hotspot/share/runtime/handshake.hpp
cancel_inner() is dead now.
---
src/hotspot/share/runtime/handshake.cpp
This was an odd looking for loop before your change and now looks even
  for ( ; JavaThread *thr = jtiwh.next(); ) {
  for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {
?
Thanks, fixed with below patch.
/Robbin
diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.cpp
--- a/src/hotspot/share/runtime/handshake.cpp    Sun Oct 28 20:57:24
2018 +0100
+++ b/src/hotspot/share/runtime/handshake.cpp    Mon Oct 29 09:32:26
2018 +0100
@@ -166,1 +166,1 @@
-    for ( ; JavaThread *thr = jtiwh.next(); ) {
+    for (JavaThread *thr = jtiwh.next(); thr != NULL; thr =
jtiwh.next()) {
@@ -198,1 +198,1 @@
-          for ( ; JavaThread *thr = jtiwh.next(); ) {
+          for (JavaThread *thr = jtiwh.next(); thr != NULL; thr =
jtiwh.next()) {
diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.hpp
--- a/src/hotspot/share/runtime/handshake.hpp    Sun Oct 28 20:57:24
2018 +0100
+++ b/src/hotspot/share/runtime/handshake.hpp    Mon Oct 29 09:32:26
2018 +0100
@@ -63,1 +62,0 @@
-  void cancel_inner(JavaThread* thread);
Post by Erik Österlund
---
Thanks,
David
Post by Robbin Ehn
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different
ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting
JavaThread is present and tries to execute a VM operation, hence
waiting on VM thread to finish the handshake, the JavaThread in the
exit path can never reach the handshake cancellation point. VM
thread cannot finishes the handshake and the third thread is stuck
waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread
spots a terminated thread it will count that thread is already done
by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call
from vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock,
Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed
while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce
false positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Post by Robbin Ehn
Thanks, Robbin
Daniel D. Daugherty
2018-10-30 13:18:25 UTC
Permalink
Post by David Holmes
Thanks for the explanation Robbin.
The inline patch also seems fine. I hope the other reviewers noticed it.
Yes, but I forgot to reply to it.

Thumbs up.

Dan
Post by David Holmes
David
Post by Robbin Ehn
Hi David,
Post by Erik Österlund
Hi Robbin,
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
I can't say I really understand the change in protocol here and why
all the cancel operations are no longer needed. I see the handshake
VM operations reusing the initial "threads list" but I'm unclear how
they might be affected if additional threads are added to the system
before the Threads_lock is acquired?
The ThreadsList is a snapshot of all the JavaThreads at that time in the VM.
Handshake all threads only handshake those JavaThreads. We do not care about new
threads.
The typical generic use-case is the similar to RCU. You first update a global
state and emit the handshake when the handshake return no thread can see the old
state.
GlobalFuncPtr = some_new_func;
HandshakeAllThreads;
------------------------------
No thread can be executing the old func.
If the JavaThreads have a local copy of GlobalFuncPtr the handshake
operation would be to update the local copy to some_new_func.
It works for both Java and for VM resources that respect safepoints.
For a pure VM resource it's much cheaper to use the GlobalCounter.
The Threads_lock must only be held for S/R protocol.
With changes to the S/R protocol, such as using handshake instead, we can remove
Threads_lock for handshakes completely. (with a other small fixes)
The cancel is no longer needed since the terminated threads are visible to the
VM thread when we keep the arming threadslist. We add terminated threads as safe
for handshake. But if we handshake a terminated thread we do not execute the
handshake operation, instead just clear the operation and increment the
completed counter. (the VM thread cancels the operation)
I hope that helped?
Post by Erik Österlund
src/hotspot/share/runtime/handshake.hpp
cancel_inner() is dead now.
---
src/hotspot/share/runtime/handshake.cpp
This was an odd looking for loop before your change and now looks
  for ( ; JavaThread *thr = jtiwh.next(); ) {
  for (JavaThread *thr = jtiwh.next(); thr != NULL; thr =
jtiwh.next()) {
?
Thanks, fixed with below patch.
/Robbin
diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.cpp
--- a/src/hotspot/share/runtime/handshake.cpp    Sun Oct 28 20:57:24
2018 +0100
+++ b/src/hotspot/share/runtime/handshake.cpp    Mon Oct 29 09:32:26
2018 +0100
@@ -166,1 +166,1 @@
-    for ( ; JavaThread *thr = jtiwh.next(); ) {
+    for (JavaThread *thr = jtiwh.next(); thr != NULL; thr =
jtiwh.next()) {
@@ -198,1 +198,1 @@
-          for ( ; JavaThread *thr = jtiwh.next(); ) {
+          for (JavaThread *thr = jtiwh.next(); thr != NULL; thr =
jtiwh.next()) {
diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.hpp
--- a/src/hotspot/share/runtime/handshake.hpp    Sun Oct 28 20:57:24
2018 +0100
+++ b/src/hotspot/share/runtime/handshake.hpp    Mon Oct 29 09:32:26
2018 +0100
@@ -63,1 +62,0 @@
-  void cancel_inner(JavaThread* thread);
Post by Erik Österlund
---
Thanks,
David
Post by Robbin Ehn
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different
ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting
JavaThread is present and tries to execute a VM operation, hence
waiting on VM thread to finish the handshake, the JavaThread in
the exit path can never reach the handshake cancellation point.
VM thread cannot finishes the handshake and the third thread is
stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread
spots a terminated thread it will count that thread is already
done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should
call from vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock,
Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be
resumed while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce
false positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not
holding Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume.
runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Post by Robbin Ehn
Thanks, Robbin
Robbin Ehn
2018-10-31 07:03:30 UTC
Permalink
Thanks Dan, Robbin
Post by Daniel D. Daugherty
Post by David Holmes
Thanks for the explanation Robbin.
The inline patch also seems fine. I hope the other reviewers noticed it.
Yes, but I forgot to reply to it.
Thumbs up.
Dan
Post by David Holmes
David
Post by Robbin Ehn
Hi David,
Post by Erik Österlund
Hi Robbin,
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
I can't say I really understand the change in protocol here and why all the
cancel operations are no longer needed. I see the handshake VM operations
reusing the initial "threads list" but I'm unclear how they might be
affected if additional threads are added to the system before the
Threads_lock is acquired?
The ThreadsList is a snapshot of all the JavaThreads at that time in the VM.
Handshake all threads only handshake those JavaThreads. We do not care about new
threads.
The typical generic use-case is the similar to RCU. You first update a global
state and emit the handshake when the handshake return no thread can see the old
state.
GlobalFuncPtr = some_new_func;
HandshakeAllThreads;
------------------------------
No thread can be executing the old func.
If the JavaThreads have a local copy of GlobalFuncPtr the handshake operation
would be to update the local copy to some_new_func.
It works for both Java and for VM resources that respect safepoints.
For a pure VM resource it's much cheaper to use the GlobalCounter.
The Threads_lock must only be held for S/R protocol.
With changes to the S/R protocol, such as using handshake instead, we can remove
Threads_lock for handshakes completely. (with a other small fixes)
The cancel is no longer needed since the terminated threads are visible to the
VM thread when we keep the arming threadslist. We add terminated threads as safe
for handshake. But if we handshake a terminated thread we do not execute the
handshake operation, instead just clear the operation and increment the
completed counter. (the VM thread cancels the operation)
I hope that helped?
Post by Erik Österlund
src/hotspot/share/runtime/handshake.hpp
cancel_inner() is dead now.
---
src/hotspot/share/runtime/handshake.cpp
This was an odd looking for loop before your change and now looks even more
  for ( ; JavaThread *thr = jtiwh.next(); ) {
  for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {
?
Thanks, fixed with below patch.
/Robbin
diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.cpp
--- a/src/hotspot/share/runtime/handshake.cpp    Sun Oct 28 20:57:24 2018 +0100
+++ b/src/hotspot/share/runtime/handshake.cpp    Mon Oct 29 09:32:26 2018 +0100
@@ -166,1 +166,1 @@
-    for ( ; JavaThread *thr = jtiwh.next(); ) {
+    for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {
@@ -198,1 +198,1 @@
-          for ( ; JavaThread *thr = jtiwh.next(); ) {
+          for (JavaThread *thr = jtiwh.next(); thr != NULL; thr =
jtiwh.next()) {
diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.hpp
--- a/src/hotspot/share/runtime/handshake.hpp    Sun Oct 28 20:57:24 2018 +0100
+++ b/src/hotspot/share/runtime/handshake.hpp    Mon Oct 29 09:32:26 2018 +0100
@@ -63,1 +62,0 @@
-  void cancel_inner(JavaThread* thread);
Post by Erik Österlund
---
Thanks,
David
Post by Robbin Ehn
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread
to finish the handshake, the JavaThread in the exit path can never reach
the handshake cancellation point. VM thread cannot finishes the handshake
and the third thread is stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call from
vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed while
the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false
positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Post by Robbin Ehn
Thanks, Robbin
Daniel D. Daugherty
2018-10-29 13:52:32 UTC
Permalink
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
src/hotspot/share/runtime/handshake.cpp
    No comments.

test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
    No comments.

Thumbs up!

Thanks for making the updates.

Dan
Post by Robbin Ehn
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different
ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting
JavaThread is present and tries to execute a VM operation, hence
waiting on VM thread to finish the handshake, the JavaThread in the
exit path can never reach the handshake cancellation point. VM
thread cannot finishes the handshake and the third thread is stuck
waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots
a terminated thread it will count that thread is already done by
only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread*
target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call
from vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock,
Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed
while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false
positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Robbin Ehn
2018-10-29 14:21:09 UTC
Permalink
Thanks Dan!

/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
src/hotspot/share/runtime/handshake.cpp
    No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
    No comments.
Thumbs up!
Thanks for making the updates.
Dan
Post by Robbin Ehn
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread to
finish the handshake, the JavaThread in the exit path can never reach the
handshake cancellation point. VM thread cannot finishes the handshake and
the third thread is stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call from vm
thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Post by Robbin Ehn
Thanks, Ro
s***@oracle.com
2018-10-29 17:35:04 UTC
Permalink
Hi Robbin,

+1

Thanks,
Serguei
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
src/hotspot/share/runtime/handshake.cpp
    No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
    No comments.
Thumbs up!
Thanks for making the updates.
Dan
Post by Robbin Ehn
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different
ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting
JavaThread is present and tries to execute a VM operation, hence
waiting on VM thread to finish the handshake, the JavaThread in the
exit path can never reach the handshake cancellation point. VM
thread cannot finishes the handshake and the third thread is stuck
waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread
spots a terminated thread it will count that thread is already done
by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread*
target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call
from vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock,
Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed
while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce
false positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Post by Robbin Ehn
Thanks, Robbin
Robbin Ehn
2018-10-31 07:04:18 UTC
Permalink
Thanks Serguei, Robbin
Post by Erik Österlund
Hi Robbin,
+1
Thanks,
Serguei
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi Dan,
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
src/hotspot/share/runtime/handshake.cpp
    No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
    No comments.
Thumbs up!
Thanks for making the updates.
Dan
Post by Robbin Ehn
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
/Robbin
Post by Daniel D. Daugherty
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread to
finish the handshake, the JavaThread in the exit path can never reach the
handshake cancellation point. VM thread cannot finishes the handshake and
the third thread is stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
src/hotspot/share/runtime/handshake.hpp
     No comments.
src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call from vm
thread");
         Both calls to handshake_process_by_vmthread() which calls this
         MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
         Looks like the lock is grabbed because of
         L351:   // An externally suspended thread cannot be resumed while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false
positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }
         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.
         assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().
src/hotspot/share/runtime/thread.hpp
     No comments.
src/hotspot/share/runtime/thread.cpp
     No comments.
src/hotspot/share/runtime/threadSMR.cpp
     No comments.
test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.
     L49:         while(!exit_now) {
         nit - please add a space before '('
     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='
         So why every other thread? A comment would be good...
     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.
     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.
Dan
Post by Robbin Ehn
Thanks, Robbin
s***@oracle.com
2018-10-26 17:23:10 UTC
Permalink
Hi Robbin,

The fix looks good to me.
Thank you a lot for identifying and fixing this issue!
Really nice, new jtreg handshake + thread suspend test reproduced this
deadlock.

Thanks,
Serguei
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread
is present and tries to execute a VM operation, hence waiting on VM
thread to finish the handshake, the JavaThread in the exit path can
never reach the handshake cancellation point. VM thread cannot
finishes the handshake and the third thread is stuck waiting on the VM
thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
Thanks, Robbin
Robbin Ehn
2018-10-28 20:02:49 UTC
Permalink
Post by Erik Österlund
Hi Robbin,
The fix looks good to me.
Thank you a lot for identifying and fixing this issue!
Really nice, new jtreg handshake + thread suspend test reproduced this deadlock.
Thanks Serguei!

/Robbin
Post by Erik Österlund
Thanks,
Serguei
Post by Robbin Ehn
Hi, please review.
When the VM thread executes a handshake it uses different ThreadsLists during
the execution. A JavaThread that is armed for the handshake when it is already
in the exit path in VM will cancel the handshake. Even if the VM thread cannot
see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.
But if a third thread takes a ThreadsList where the exiting JavaThread is
present and tries to execute a VM operation, hence waiting on VM thread to
finish the handshake, the JavaThread in the exit path can never reach the
handshake cancellation point. VM thread cannot finishes the handshake and the
third thread is stuck waiting on the VM thread.
To allow holding a ThreadsList when executing a VM operation we instead let the
VM thread use the same ThreadsList over the entire handshake making all armed
threads visible to the VM thread at all time. And if VM thread spots a
terminated thread it will count that thread is already done by only clearing
it's operation.
Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
Thanks, Robbin
Loading...