Discussion:
RFR 8203174: [Graal] JDI tests fail with Unexpected exception: com.sun.jdi.ObjectCollectedException
Daniil Titov
2018-11-02 23:29:14 UTC
Permalink
Please review the change that fixes several tests failing with com.sun.jdi.ObjectCollectedException when running with Graal.

There main problem here is that with Graal on JVMCI Compiler threads in the target VM create a lot of objects and sporadically trigger GC that results in the objects created in the target VM in the tests being GCed before the tests complete. The other problem is that for some classes the tests use, e.g. "java.lang.String", there is already a huge number of instances created by JVMCI threads.

The fix suspends target VM to prevent JVMCI compiler threads from creating new objects during the tests execution. In cases when IOPipe is used for communication between the test and the debuggee the fix suspends all threads but the main rather than suspending the VM. The fix also filters out the checks for some test classes (e.g. "java.lang.String") in cases when the target VM is run with JVMCI compiler options on.

Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174

Thanks,
Daniil
Chris Plummer
2018-11-03 03:59:17 UTC
Permalink
Hi Daniil,

I thought the issue was that C2 was doing metadata allocations, and when
it ran out of metaspace it did a GC (one that was not triggered by a
failed object allocation). As a results we were getting objects GCs
before the test ever got the chance to disable collection on them. I
thought we also concluded other than this metaspace issue, if the test
is properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.

I don't think the following check will always be valid. It may be on by
default at some point:

 651     public boolean isJVMCICompilerOn() {
 652         String opts = argumentHandler.getLaunchOptions();
 653         return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
 654     }

thanks,

Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler threads in the target VM create a lot of objects and sporadically trigger GC that results in the objects created in the target VM in the tests being GCed before the tests complete. The other problem is that for some classes the tests use, e.g. "java.lang.String", there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from creating new objects during the tests execution. In cases when IOPipe is used for communication between the test and the debuggee the fix suspends all threads but the main rather than suspending the VM. The fix also filters out the checks for some test classes (e.g. "java.lang.String") in cases when the target VM is run with JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Daniil Titov
2018-11-04 20:40:55 UTC
Permalink
Hi Chris,

There were multiple issues here but with JDK-8193126 being fixed I can no longer observe metaspace related issues and suspending the target VM seems as a sufficient to prevent newly created objects being collected before we have a chance to call disableCollection() on them.

I noticed that the check for JFR uses the same approach to detect whether the Flight Recorder is on.

public boolean isJFR_active() {
String opts = argumentHandler.getLaunchOptions();
int jfrPos = opts.indexOf("-XX:+FlightRecorder");
if (jfrPos >= 0)
return true;
else
return false;
}

The other option I had in mind is to check the presence of "JMCI Compiler" threads in the target VM but I less inclined to it since it relies on the thread name that might be changed at some point in the future as well. On more option could be to completely remove checks with "java.lang.String" class regardless JVMCI is on or off but it would reduce the coverage in the case when JVMC is off.

Best regards,
Daniil

On 11/2/18, 8:59 PM, "Chris Plummer" <***@oracle.com> wrote:

Hi Daniil,

I thought the issue was that C2 was doing metadata allocations, and when
it ran out of metaspace it did a GC (one that was not triggered by a
failed object allocation). As a results we were getting objects GCs
before the test ever got the chance to disable collection on them. I
thought we also concluded other than this metaspace issue, if the test
is properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.

I don't think the following check will always be valid. It may be on by
default at some point:

651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }

thanks,

Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler threads in the target VM create a lot of objects and sporadically trigger GC that results in the objects created in the target VM in the tests being GCed before the tests complete. The other problem is that for some classes the tests use, e.g. "java.lang.String", there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from creating new objects during the tests execution. In cases when IOPipe is used for communication between the test and the debuggee the fix suspends all threads but the main rather than suspending the VM. The fix also filters out the checks for some test classes (e.g. "java.lang.String") in cases when the target VM is run with JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
JC Beyler
2018-11-05 02:49:47 UTC
Permalink
Hi Daniil,

Quick question about looking at the launch options, is there any reason you
do not simply use:
sun.hotspot.code.Compiler.isGraalEnabled();

Is it because you do not want to specifically call out Graal?

Thanks,
Jc
Post by Daniil Titov
Hi Chris,
There were multiple issues here but with JDK-8193126 being fixed I can no
longer observe metaspace related issues and suspending the target VM seems
as a sufficient to prevent newly created objects being collected before we
have a chance to call disableCollection() on them.
I noticed that the check for JFR uses the same approach to detect whether
the Flight Recorder is on.
public boolean isJFR_active() {
String opts = argumentHandler.getLaunchOptions();
int jfrPos = opts.indexOf("-XX:+FlightRecorder");
if (jfrPos >= 0)
return true;
else
return false;
}
The other option I had in mind is to check the presence of "JMCI Compiler"
threads in the target VM but I less inclined to it since it relies on the
thread name that might be changed at some point in the future as well. On
more option could be to completely remove checks with "java.lang.String"
class regardless JVMCI is on or off but it would reduce the coverage in the
case when JVMC is off.
Best regards,
Daniil
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and when
it ran out of metaspace it did a GC (one that was not triggered by a
failed object allocation). As a results we were getting objects GCs
before the test ever got the chance to disable collection on them. I
thought we also concluded other than this metaspace issue, if the test
is properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on by
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
Post by Daniil Titov
There main problem here is that with Graal on JVMCI Compiler threads
in the target VM create a lot of objects and sporadically trigger GC that
results in the objects created in the target VM in the tests being GCed
before the tests complete. The other problem is that for some classes the
tests use, e.g. "java.lang.String", there is already a huge number of
instances created by JVMCI threads.
Post by Daniil Titov
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when IOPipe is
used for communication between the test and the debuggee the fix suspends
all threads but the main rather than suspending the VM. The fix also
filters out the checks for some test classes (e.g. "java.lang.String") in
cases when the target VM is run with JVMCI compiler options on.
Post by Daniil Titov
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
--
Thanks,
Jc
s***@oracle.com
2018-11-05 07:52:06 UTC
Permalink
Hi Daniil,

Just wanted to comment this as I understand it.
Post by Daniil Titov
Hi Chris,
There were multiple issues here but with JDK-8193126 being fixed I can no longer observe metaspace related issues and suspending the target VM seems as a sufficient to prevent newly created objects being collected before we have a chance to call disableCollection() on them.
Your change looks like a workaround for the Graal issue that it runs out
of metaspace.
It could be Okay to have such a fix until the underlying issue is fixed.
However, this workaround is most likely unsafe as Dean L. noticed.
Post by Daniil Titov
I noticed that the check for JFR uses the same approach to detect whether the Flight Recorder is on.
public boolean isJFR_active() {
String opts = argumentHandler.getLaunchOptions();
int jfrPos = opts.indexOf("-XX:+FlightRecorder");
if (jfrPos >= 0)
return true;
else
return false;
}
Most likely, this also was a workaround for some other issue related to
the JFR activity.
The best approach is to avoid such workarounds and be more precise on
what is really tested.

Thanks,
Serguei
Post by Daniil Titov
The other option I had in mind is to check the presence of "JMCI Compiler" threads in the target VM but I less inclined to it since it relies on the thread name that might be changed at some point in the future as well. On more option could be to completely remove checks with "java.lang.String" class regardless JVMCI is on or off but it would reduce the coverage in the case when JVMC is off.
Best regards,
Daniil
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and when
it ran out of metaspace it did a GC (one that was not triggered by a
failed object allocation). As a results we were getting objects GCs
before the test ever got the chance to disable collection on them. I
thought we also concluded other than this metaspace issue, if the test
is properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on by
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler threads in the target VM create a lot of objects and sporadically trigger GC that results in the objects created in the target VM in the tests being GCed before the tests complete. The other problem is that for some classes the tests use, e.g. "java.lang.String", there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from creating new objects during the tests execution. In cases when IOPipe is used for communication between the test and the debuggee the fix suspends all threads but the main rather than suspending the VM. The fix also filters out the checks for some test classes (e.g. "java.lang.String") in cases when the target VM is run with JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Chris Plummer
2018-11-05 18:54:08 UTC
Permalink
JFR is a bit different than Graal since it is (and should remain) a
feature that the user will enable, whereas I can see Graal at some point
always being enabled (or be so by default), and you will no longer see
-XX:+UseJVMCICompiler on the command line.

thanks,

Chris
Post by Daniil Titov
Hi Chris,
There were multiple issues here but with JDK-8193126 being fixed I can no longer observe metaspace related issues and suspending the target VM seems as a sufficient to prevent newly created objects being collected before we have a chance to call disableCollection() on them.
I noticed that the check for JFR uses the same approach to detect whether the Flight Recorder is on.
public boolean isJFR_active() {
String opts = argumentHandler.getLaunchOptions();
int jfrPos = opts.indexOf("-XX:+FlightRecorder");
if (jfrPos >= 0)
return true;
else
return false;
}
The other option I had in mind is to check the presence of "JMCI Compiler" threads in the target VM but I less inclined to it since it relies on the thread name that might be changed at some point in the future as well. On more option could be to completely remove checks with "java.lang.String" class regardless JVMCI is on or off but it would reduce the coverage in the case when JVMC is off.
Best regards,
Daniil
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and when
it ran out of metaspace it did a GC (one that was not triggered by a
failed object allocation). As a results we were getting objects GCs
before the test ever got the chance to disable collection on them. I
thought we also concluded other than this metaspace issue, if the test
is properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on by
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler threads in the target VM create a lot of objects and sporadically trigger GC that results in the objects created in the target VM in the tests being GCed before the tests complete. The other problem is that for some classes the tests use, e.g. "java.lang.String", there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from creating new objects during the tests execution. In cases when IOPipe is used for communication between the test and the debuggee the fix suspends all threads but the main rather than suspending the VM. The fix also filters out the checks for some test classes (e.g. "java.lang.String") in cases when the target VM is run with JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
s***@oracle.com
2018-11-05 07:40:06 UTC
Permalink
Hi Daniil,

I agree with Chris below.
Will tell more on reply to your reply.

Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object allocation). As a results we were getting objects
GCs before the test ever got the chance to disable collection on them.
I thought we also concluded other than this metaspace issue, if the
test is properly disabling collection on the objects it cared about,
it shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
 651     public boolean isJVMCICompilerOn() {
 652         String opts = argumentHandler.getLaunchOptions();
 653         return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
 654     }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler threads
in the target VM create a lot of objects and sporadically trigger GC
that results in the objects created in the target VM in the tests
being GCed before the tests complete. The other problem is that for
some classes the tests use, e.g. "java.lang.String",  there is
already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when IOPipe
is used for communication between the test and the debuggee the fix
suspends all threads but the main rather than suspending the VM. The
fix also filters out the checks for some test classes (e.g.
"java.lang.String") in cases when the target VM is run with JVMCI
compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
s***@oracle.com
2018-11-05 07:45:18 UTC
Permalink
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.

Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the test
ever got the chance to disable collection on them. I thought we also
concluded other than this metaspace issue, if the test is properly
disabling collection on the objects it cared about, it shouldn't
matter if there are GC's triggered by excessive object allocations.
I don't think the following check will always be valid. It may be on
 651     public boolean isJVMCICompilerOn() {
 652         String opts = argumentHandler.getLaunchOptions();
 653         return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
 654     }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler threads
in the target VM create a lot of objects and sporadically trigger GC
that results in the objects created in the target VM in the tests
being GCed before the tests complete. The other problem is that for
some classes the tests use, e.g. "java.lang.String",  there is
already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending the
VM. The fix also filters out the checks for some test classes (e.g.
"java.lang.String") in cases when the target VM is run with JVMCI
compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Chris Plummer
2018-11-05 18:46:41 UTC
Permalink
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.

Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
 651     public boolean isJVMCICompilerOn() {
 652         String opts = argumentHandler.getLaunchOptions();
 653         return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
 654     }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String", 
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Daniil Titov
2018-11-07 16:58:27 UTC
Permalink
Hi Chris and Serguei,

With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.

Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.

Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174

Thanks,
Daniil
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.

Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String",
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Chris Plummer
2018-11-07 19:33:26 UTC
Permalink
Hi Danil,

So this is the crux of the issue:

 112         debuggee.suspend();
 113
 114         List<ObjectReference> baseReferences = new LinkedList<>();
 115         // We need to ensure that the base references count is not
changed during the test.
 116         // The instances of the class could be already created by
other (e.g. JVMCI) threads
 117         // and if GC was scheduled before VM was suspended some of
these instances might
 118         // become collected.
 119         for (ObjectReference objRef : referenceType.instances(0)) {
 120             try {
 121                 objRef.disableCollection();
 122                 baseReferences.add(objRef);
 123             } catch (ObjectCollectedException e) {
 124                 // skip this reference
 125             }
 126         }
 127         baseInstances = baseReferences.size();

First it is possible for a GC to be triggered even if the debuggee is
not executing any code because of JVMCI threads. So this is why
debuggee.suspend() is needed. Second, even after calling
debuggee.suspend(), it is still possible for a GC to happen since it may
have been triggered before the debuggee.suspend() call, but not yet
completed, and debuggee.suspend() does not prevent GC from completing in
that case. So that is why you need to disableCollecion() on each object,
and accept that some of them may have already been collected. Therefore
baseInstances needs to be updated to only reflect the count of instances
that have not been collected, and will not be collected because
disableCollection() has been called on them. This all makes sense and
seems fine.

I do think the comments could use some updating. The debuggee.suspend()
call should be explained as being needed because there are potentially
other non-test java threads allocating objects and triggering GC's,
JVMCI being the main culprit here. The comment before the loop should
focus on dealing with a GC that was triggered before the suspend,
requiring disableCollection() be called each object returned by
referenceType.instances() since it can potentially be collected
otherwise. This code is not really related to the JVMCI threads.

I don't understand the reason for excluding java.lang.String from testing.

I don't understand the reason for the useStrictCheck() changes.

I don't understand what the following is fixing, and the impact on test
execution that the resume() might have:

 153             if (i > 0) {
 154                 debuggee.resume();
 155             }
 156
 157             line = pipe.readln();
 158             debuggee.suspend();


thanks,

Chris
Post by Daniil Titov
Hi Chris and Serguei,
With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.
Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String",
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Daniil Titov
2018-11-08 01:11:36 UTC
Permalink
Hi Chris,

Thank you for your comments.

Excluding of java.lang.String is not really required. It seems as it is sufficient just to expect that the object created by other threads and included in the result of referenceType.instances() and might be collected before we are able to call disableCollection() on them. The new version of the patch includes such change in vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java .

The changes in useStrictCheck() are required since some tests use array of primitives aa a test classes ( e.g. boolean[] ) and objects of these classes might be created by other (e.g. JVMC compiler) threads.
The useStrictCheck() defines how the number of instances the test created should be compared to the actual number of instance of this class in the target VM. If useStrictCheck() returns true these numbers should be equal, otherwise the test passes if the number of the instances the test created is no greater than the number of instances found.

The changes below are required since we need to suspend the target VM while the test is executed (to ensure no GC is triggered). At the same time we need to ensure the target VM is temporary resumed when we read its output from the pipe. These tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance00{1,2,3}.java) are written in a such way that the test code is executed as a body of the loop and on each iteration it reads a response from the target VM and then runs the tests, so on the first iteration vm.resume() is not required at the beginning of the loop. I added comments through the code and also updated existing comments per your suggestions.

153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();


Please review a new version of the fix with the changes mentioned above.

Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174

Thanks,
Daniil

On 11/7/18, 11:33 AM, "Chris Plummer" <***@oracle.com> wrote:

Hi Danil,

So this is the crux of the issue:

112 debuggee.suspend();
113
114 List<ObjectReference> baseReferences = new LinkedList<>();
115 // We need to ensure that the base references count is not
changed during the test.
116 // The instances of the class could be already created by
other (e.g. JVMCI) threads
117 // and if GC was scheduled before VM was suspended some of
these instances might
118 // become collected.
119 for (ObjectReference objRef : referenceType.instances(0)) {
120 try {
121 objRef.disableCollection();
122 baseReferences.add(objRef);
123 } catch (ObjectCollectedException e) {
124 // skip this reference
125 }
126 }
127 baseInstances = baseReferences.size();

First it is possible for a GC to be triggered even if the debuggee is
not executing any code because of JVMCI threads. So this is why
debuggee.suspend() is needed. Second, even after calling
debuggee.suspend(), it is still possible for a GC to happen since it may
have been triggered before the debuggee.suspend() call, but not yet
completed, and debuggee.suspend() does not prevent GC from completing in
that case. So that is why you need to disableCollecion() on each object,
and accept that some of them may have already been collected. Therefore
baseInstances needs to be updated to only reflect the count of instances
that have not been collected, and will not be collected because
disableCollection() has been called on them. This all makes sense and
seems fine.

I do think the comments could use some updating. The debuggee.suspend()
call should be explained as being needed because there are potentially
other non-test java threads allocating objects and triggering GC's,
JVMCI being the main culprit here. The comment before the loop should
focus on dealing with a GC that was triggered before the suspend,
requiring disableCollection() be called each object returned by
referenceType.instances() since it can potentially be collected
otherwise. This code is not really related to the JVMCI threads.

I don't understand the reason for excluding java.lang.String from testing.

I don't understand the reason for the useStrictCheck() changes.

I don't understand what the following is fixing, and the impact on test
execution that the resume() might have:

153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();


thanks,

Chris
Post by Daniil Titov
Hi Chris and Serguei,
With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.
Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String",
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Chris Plummer
2018-11-08 21:08:52 UTC
Permalink
Hi Daniil,

Hi Daniil,

I was wondering why instance003 did not include the same fix as
instance002. In fact I was wondering why all the tests did not include
this fix. It looks like you have now fixed instance003 the same way as
instance002, but I'm unsure of how all the other tests are being fixed.

What is the fix now for MonitorEvents002, heapwalking001,
heapwalking002, and mixed002. Are these all covered by the
useStrictCheck() fix?

For the newinstance tests, I understand what you are saying about
needing to resume the VM to read the response, but I'm not sure how this
manifested itself as a problem before the fix and how it relates to the
reported failures, since as far as I can tell you did not add a suspend,
so it must have been in place already.

Maybe what's going on here is that there are actually 3 separate issues,
which is making it hard to figure out what the failure mode is for each
test, and how the changes are addressing it.

thanks,

Chris
Post by Daniil Titov
Hi Chris,
Thank you for your comments.
Excluding of java.lang.String is not really required. It seems as it is sufficient just to expect that the object created by other threads and included in the result of referenceType.instances() and might be collected before we are able to call disableCollection() on them. The new version of the patch includes such change in vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java .
The changes in useStrictCheck() are required since some tests use array of primitives aa a test classes ( e.g. boolean[] ) and objects of these classes might be created by other (e.g. JVMC compiler) threads.
The useStrictCheck() defines how the number of instances the test created should be compared to the actual number of instance of this class in the target VM. If useStrictCheck() returns true these numbers should be equal, otherwise the test passes if the number of the instances the test created is no greater than the number of instances found.
The changes below are required since we need to suspend the target VM while the test is executed (to ensure no GC is triggered). At the same time we need to ensure the target VM is temporary resumed when we read its output from the pipe. These tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance00{1,2,3}.java) are written in a such way that the test code is executed as a body of the loop and on each iteration it reads a response from the target VM and then runs the tests, so on the first iteration vm.resume() is not required at the beginning of the loop. I added comments through the code and also updated existing comments per your suggestions.
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
Please review a new version of the fix with the changes mentioned above.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Hi Danil,
112 debuggee.suspend();
113
114 List<ObjectReference> baseReferences = new LinkedList<>();
115 // We need to ensure that the base references count is not
changed during the test.
116 // The instances of the class could be already created by
other (e.g. JVMCI) threads
117 // and if GC was scheduled before VM was suspended some of
these instances might
118 // become collected.
119 for (ObjectReference objRef : referenceType.instances(0)) {
120 try {
121 objRef.disableCollection();
122 baseReferences.add(objRef);
123 } catch (ObjectCollectedException e) {
124 // skip this reference
125 }
126 }
127 baseInstances = baseReferences.size();
First it is possible for a GC to be triggered even if the debuggee is
not executing any code because of JVMCI threads. So this is why
debuggee.suspend() is needed. Second, even after calling
debuggee.suspend(), it is still possible for a GC to happen since it may
have been triggered before the debuggee.suspend() call, but not yet
completed, and debuggee.suspend() does not prevent GC from completing in
that case. So that is why you need to disableCollecion() on each object,
and accept that some of them may have already been collected. Therefore
baseInstances needs to be updated to only reflect the count of instances
that have not been collected, and will not be collected because
disableCollection() has been called on them. This all makes sense and
seems fine.
I do think the comments could use some updating. The debuggee.suspend()
call should be explained as being needed because there are potentially
other non-test java threads allocating objects and triggering GC's,
JVMCI being the main culprit here. The comment before the loop should
focus on dealing with a GC that was triggered before the suspend,
requiring disableCollection() be called each object returned by
referenceType.instances() since it can potentially be collected
otherwise. This code is not really related to the JVMCI threads.
I don't understand the reason for excluding java.lang.String from testing.
I don't understand the reason for the useStrictCheck() changes.
I don't understand what the following is fixing, and the impact on test
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
thanks,
Chris
Post by Daniil Titov
Hi Chris and Serguei,
With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.
Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String",
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Daniil Titov
2018-11-08 23:12:21 UTC
Permalink
Hi Chris,

You are right. There are different cases here. Please find them listed below.

1. Test vmTestbase/nsk/jdi/stress/MonitorEvents/MonitorEvents002
This test was actually fixed with JDK-8206086 changes. As far as I remember it was not removed from ProblemList-graal.txt due to some sporadic failures (not related to ObjectCollectedException), however with the latest builds these failures are no longer reproduced .

2. "NewInstance" tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance002.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance003.java)

The typical failure for these tests Is that the test creates a new array with arrayType.newInstance(arraylength) but this object becomes immediately GC'ed. Calling disableCollection() on the newly created instance doesn't help since by that time the object might be already GC'ed.


com.sun.jdi.ObjectCollectedException at jdk.jdi/com.sun.tools.jdi.JDWPException.toJDIException(JDWPException.java:55)
at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValues(ArrayReferenceImpl.java:126)
at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValue(ArrayReferenceImpl.java:77)
at nsk.jdi.ArrayType.newInstance.newinstance001.runThis(newinstance001.java:316)
at nsk.jdi.ArrayType.newInstance.newinstance001.run(newinstance001.java:84)
at nsk.jdi.ArrayType.newInstance.newinstance001.main(newinstance001.java:79)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
at java.base/java.lang.Thread.run(Thread.java:835)

The proposed fix for this case is to suspend VM while the body of the "for" loop (lines 150 - 356) is executed while ensuring that for line 151 ( line = pipe.printlln()) VM is temporary resumed ( line numbers are given for an original version of test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)

147 //------------------------------------------------------ testing section
148 log1(" TESTING BEGINS");
149
150 for (int i = 0; ; i++) {
151 pipe.println("newcheck");
152 line = pipe.readln();
153
<skipped>

356 }
357 log1(" TESTING ENDS");
358

3. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances002/instances002.java
There were 2 issues here:
a) The same issue as above. On line 125 calling disableCollection () on newly created array object (arrayType.newInstance()) resulted in ObjectCollectedException was thrown
The fix here is to suspend VM while the testing code is executed.
b) IndexOutOfBoundException on line 130 (checkDebugeeAnswer_instances(className, createInstanceCount + baseInstances) since some of base instances that were created before VM was suspended become collected if GC was triggered before VM was suspended.
The fix here is to call disableCollection() an all base instances and filter out ones that were already collected.

4. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java
The problem here was ObjectCollectedException thrown on line 141 while iterating over newly created instances since these instances also include ones created by other java threads and some of them become GC'ed before disableCollection() is called on them. In this test the new instances are created by debuggee itself (the test sends the commands to debuggee to create or delete instances using pipe) so here we cannot use suspend VM approach.
The fix is to filter out already collected instances ( these are the instances created outside of the test) and ensure that instances created by the test are not GC'ed before we call disableCollection() on them. For the latter the fix just extends the approach used by the test for WEAK references ( to ask debuggee to create a temporary strong reference and then delete them after we call disableCollection() ) to other reference types.

5. Stress tests vmTestbase/nsk/jdi/stress/serial/heapwalking001/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/heapwalking002/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java
Two problems here:
1) These tests use primitive arrays as test classes and these classes might be created by other java threads outside the test . Changes in useStrictCheck() fix this.
2) These tests include test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java test that is fixed at 4 above


Thanks,
Daniil

On 11/8/18, 1:10 PM, "Chris Plummer" <***@oracle.com> wrote:

Hi Daniil,

Hi Daniil,

I was wondering why instance003 did not include the same fix as
instance002. In fact I was wondering why all the tests did not include
this fix. It looks like you have now fixed instance003 the same way as
instance002, but I'm unsure of how all the other tests are being fixed.

What is the fix now for MonitorEvents002, heapwalking001,
heapwalking002, and mixed002. Are these all covered by the
useStrictCheck() fix?

For the newinstance tests, I understand what you are saying about
needing to resume the VM to read the response, but I'm not sure how this
manifested itself as a problem before the fix and how it relates to the
reported failures, since as far as I can tell you did not add a suspend,
so it must have been in place already.

Maybe what's going on here is that there are actually 3 separate issues,
which is making it hard to figure out what the failure mode is for each
test, and how the changes are addressing it.

thanks,

Chris
Post by Daniil Titov
Hi Chris,
Thank you for your comments.
Excluding of java.lang.String is not really required. It seems as it is sufficient just to expect that the object created by other threads and included in the result of referenceType.instances() and might be collected before we are able to call disableCollection() on them. The new version of the patch includes such change in vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java .
The changes in useStrictCheck() are required since some tests use array of primitives aa a test classes ( e.g. boolean[] ) and objects of these classes might be created by other (e.g. JVMC compiler) threads.
The useStrictCheck() defines how the number of instances the test created should be compared to the actual number of instance of this class in the target VM. If useStrictCheck() returns true these numbers should be equal, otherwise the test passes if the number of the instances the test created is no greater than the number of instances found.
The changes below are required since we need to suspend the target VM while the test is executed (to ensure no GC is triggered). At the same time we need to ensure the target VM is temporary resumed when we read its output from the pipe. These tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance00{1,2,3}.java) are written in a such way that the test code is executed as a body of the loop and on each iteration it reads a response from the target VM and then runs the tests, so on the first iteration vm.resume() is not required at the beginning of the loop. I added comments through the code and also updated existing comments per your suggestions.
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
Please review a new version of the fix with the changes mentioned above.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Hi Danil,
112 debuggee.suspend();
113
114 List<ObjectReference> baseReferences = new LinkedList<>();
115 // We need to ensure that the base references count is not
changed during the test.
116 // The instances of the class could be already created by
other (e.g. JVMCI) threads
117 // and if GC was scheduled before VM was suspended some of
these instances might
118 // become collected.
119 for (ObjectReference objRef : referenceType.instances(0)) {
120 try {
121 objRef.disableCollection();
122 baseReferences.add(objRef);
123 } catch (ObjectCollectedException e) {
124 // skip this reference
125 }
126 }
127 baseInstances = baseReferences.size();
First it is possible for a GC to be triggered even if the debuggee is
not executing any code because of JVMCI threads. So this is why
debuggee.suspend() is needed. Second, even after calling
debuggee.suspend(), it is still possible for a GC to happen since it may
have been triggered before the debuggee.suspend() call, but not yet
completed, and debuggee.suspend() does not prevent GC from completing in
that case. So that is why you need to disableCollecion() on each object,
and accept that some of them may have already been collected. Therefore
baseInstances needs to be updated to only reflect the count of instances
that have not been collected, and will not be collected because
disableCollection() has been called on them. This all makes sense and
seems fine.
I do think the comments could use some updating. The debuggee.suspend()
call should be explained as being needed because there are potentially
other non-test java threads allocating objects and triggering GC's,
JVMCI being the main culprit here. The comment before the loop should
focus on dealing with a GC that was triggered before the suspend,
requiring disableCollection() be called each object returned by
referenceType.instances() since it can potentially be collected
otherwise. This code is not really related to the JVMCI threads.
I don't understand the reason for excluding java.lang.String from testing.
I don't understand the reason for the useStrictCheck() changes.
I don't understand what the following is fixing, and the impact on test
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
thanks,
Chris
Post by Daniil Titov
Hi Chris and Serguei,
With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.
Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String",
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Chris Plummer
2018-11-09 01:11:51 UTC
Permalink
Hi Daniil,
Post by Daniil Titov
Hi Chris,
You are right. There are different cases here. Please find them listed below.
1. Test vmTestbase/nsk/jdi/stress/MonitorEvents/MonitorEvents002
This test was actually fixed with JDK-8206086 changes. As far as I remember it was not removed from ProblemList-graal.txt due to some sporadic failures (not related to ObjectCollectedException), however with the latest builds these failures are no longer reproduced .
OK.
Post by Daniil Titov
2. "NewInstance" tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance002.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance003.java)
The typical failure for these tests Is that the test creates a new array with arrayType.newInstance(arraylength) but this object becomes immediately GC'ed. Calling disableCollection() on the newly created instance doesn't help since by that time the object might be already GC'ed.
com.sun.jdi.ObjectCollectedException at jdk.jdi/com.sun.tools.jdi.JDWPException.toJDIException(JDWPException.java:55)
at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValues(ArrayReferenceImpl.java:126)
at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValue(ArrayReferenceImpl.java:77)
at nsk.jdi.ArrayType.newInstance.newinstance001.runThis(newinstance001.java:316)
at nsk.jdi.ArrayType.newInstance.newinstance001.run(newinstance001.java:84)
at nsk.jdi.ArrayType.newInstance.newinstance001.main(newinstance001.java:79)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
at java.base/java.lang.Thread.run(Thread.java:835)
The proposed fix for this case is to suspend VM while the body of the "for" loop (lines 150 - 356) is executed while ensuring that for line 151 ( line = pipe.printlln()) VM is temporary resumed ( line numbers are given for an original version of test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
147 //------------------------------------------------------ testing section
148 log1(" TESTING BEGINS");
149
150 for (int i = 0; ; i++) {
151 pipe.println("newcheck");
152 line = pipe.readln();
153
<skipped>
356 }
357 log1(" TESTING ENDS");
358
Ok. I get it now. It was confusing because there is a resume before the
suspend, but I see now why it was done that way.  I assume you
originally tried just doing the suspend outside of the loop, but found
that caused problems for pipe.readln(), so you moved it to inside the
loop and after the pipe.readln().
Post by Daniil Titov
3. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances002/instances002.java
a) The same issue as above. On line 125 calling disableCollection () on newly created array object (arrayType.newInstance()) resulted in ObjectCollectedException was thrown
The fix here is to suspend VM while the testing code is executed.
b) IndexOutOfBoundException on line 130 (checkDebugeeAnswer_instances(className, createInstanceCount + baseInstances) since some of base instances that were created before VM was suspended become collected if GC was triggered before VM was suspended.
The fix here is to call disableCollection() an all base instances and filter out ones that were already collected.
Ok.
Post by Daniil Titov
4. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java
The problem here was ObjectCollectedException thrown on line 141 while iterating over newly created instances since these instances also include ones created by other java threads and some of them become GC'ed before disableCollection() is called on them. In this test the new instances are created by debuggee itself (the test sends the commands to debuggee to create or delete instances using pipe) so here we cannot use suspend VM approach.
The fix is to filter out already collected instances ( these are the instances created outside of the test) and ensure that instances created by the test are not GC'ed before we call disableCollection() on them. For the latter the fix just extends the approach used by the test for WEAK references ( to ask debuggee to create a temporary strong reference and then delete them after we call disableCollection() ) to other reference types.
Ok.
Post by Daniil Titov
5. Stress tests vmTestbase/nsk/jdi/stress/serial/heapwalking001/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/heapwalking002/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java
1) These tests use primitive arrays as test classes and these classes might be created by other java threads outside the test . Changes in useStrictCheck() fix this.
2) These tests include test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java test that is fixed at 4 above
Ok.

Thanks for describing the separate issues. Can you make sure the CR is
updated with this info?

thanks,

Chris
Post by Daniil Titov
Thanks,
Daniil
Hi Daniil,
Hi Daniil,
I was wondering why instance003 did not include the same fix as
instance002. In fact I was wondering why all the tests did not include
this fix. It looks like you have now fixed instance003 the same way as
instance002, but I'm unsure of how all the other tests are being fixed.
What is the fix now for MonitorEvents002, heapwalking001,
heapwalking002, and mixed002. Are these all covered by the
useStrictCheck() fix?
For the newinstance tests, I understand what you are saying about
needing to resume the VM to read the response, but I'm not sure how this
manifested itself as a problem before the fix and how it relates to the
reported failures, since as far as I can tell you did not add a suspend,
so it must have been in place already.
Maybe what's going on here is that there are actually 3 separate issues,
which is making it hard to figure out what the failure mode is for each
test, and how the changes are addressing it.
thanks,
Chris
Post by Daniil Titov
Hi Chris,
Thank you for your comments.
Excluding of java.lang.String is not really required. It seems as it is sufficient just to expect that the object created by other threads and included in the result of referenceType.instances() and might be collected before we are able to call disableCollection() on them. The new version of the patch includes such change in vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java .
The changes in useStrictCheck() are required since some tests use array of primitives aa a test classes ( e.g. boolean[] ) and objects of these classes might be created by other (e.g. JVMC compiler) threads.
The useStrictCheck() defines how the number of instances the test created should be compared to the actual number of instance of this class in the target VM. If useStrictCheck() returns true these numbers should be equal, otherwise the test passes if the number of the instances the test created is no greater than the number of instances found.
The changes below are required since we need to suspend the target VM while the test is executed (to ensure no GC is triggered). At the same time we need to ensure the target VM is temporary resumed when we read its output from the pipe. These tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance00{1,2,3}.java) are written in a such way that the test code is executed as a body of the loop and on each iteration it reads a response from the target VM and then runs the tests, so on the first iteration vm.resume() is not required at the beginning of the loop. I added comments through the code and also updated existing comments per your suggestions.
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
Please review a new version of the fix with the changes mentioned above.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Hi Danil,
112 debuggee.suspend();
113
114 List<ObjectReference> baseReferences = new LinkedList<>();
115 // We need to ensure that the base references count is not
changed during the test.
116 // The instances of the class could be already created by
other (e.g. JVMCI) threads
117 // and if GC was scheduled before VM was suspended some of
these instances might
118 // become collected.
119 for (ObjectReference objRef : referenceType.instances(0)) {
120 try {
121 objRef.disableCollection();
122 baseReferences.add(objRef);
123 } catch (ObjectCollectedException e) {
124 // skip this reference
125 }
126 }
127 baseInstances = baseReferences.size();
First it is possible for a GC to be triggered even if the debuggee is
not executing any code because of JVMCI threads. So this is why
debuggee.suspend() is needed. Second, even after calling
debuggee.suspend(), it is still possible for a GC to happen since it may
have been triggered before the debuggee.suspend() call, but not yet
completed, and debuggee.suspend() does not prevent GC from completing in
that case. So that is why you need to disableCollecion() on each object,
and accept that some of them may have already been collected. Therefore
baseInstances needs to be updated to only reflect the count of instances
that have not been collected, and will not be collected because
disableCollection() has been called on them. This all makes sense and
seems fine.
I do think the comments could use some updating. The debuggee.suspend()
call should be explained as being needed because there are potentially
other non-test java threads allocating objects and triggering GC's,
JVMCI being the main culprit here. The comment before the loop should
focus on dealing with a GC that was triggered before the suspend,
requiring disableCollection() be called each object returned by
referenceType.instances() since it can potentially be collected
otherwise. This code is not really related to the JVMCI threads.
I don't understand the reason for excluding java.lang.String from testing.
I don't understand the reason for the useStrictCheck() changes.
I don't understand what the following is fixing, and the impact on test
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
thanks,
Chris
Post by Daniil Titov
Hi Chris and Serguei,
With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.
Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String",
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Daniil Titov
2018-11-09 15:47:30 UTC
Permalink
Hi Chris,

I updated the bug with the info about all these separate issues.

Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174

Thanks,
Daniil

On 11/8/18, 5:11 PM, "Chris Plummer" <***@oracle.com> wrote:

Hi Daniil,
Post by Daniil Titov
Hi Chris,
You are right. There are different cases here. Please find them listed below.
1. Test vmTestbase/nsk/jdi/stress/MonitorEvents/MonitorEvents002
This test was actually fixed with JDK-8206086 changes. As far as I remember it was not removed from ProblemList-graal.txt due to some sporadic failures (not related to ObjectCollectedException), however with the latest builds these failures are no longer reproduced .
OK.
Post by Daniil Titov
2. "NewInstance" tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance002.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance003.java)
The typical failure for these tests Is that the test creates a new array with arrayType.newInstance(arraylength) but this object becomes immediately GC'ed. Calling disableCollection() on the newly created instance doesn't help since by that time the object might be already GC'ed.
com.sun.jdi.ObjectCollectedException at jdk.jdi/com.sun.tools.jdi.JDWPException.toJDIException(JDWPException.java:55)
at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValues(ArrayReferenceImpl.java:126)
at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValue(ArrayReferenceImpl.java:77)
at nsk.jdi.ArrayType.newInstance.newinstance001.runThis(newinstance001.java:316)
at nsk.jdi.ArrayType.newInstance.newinstance001.run(newinstance001.java:84)
at nsk.jdi.ArrayType.newInstance.newinstance001.main(newinstance001.java:79)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
at java.base/java.lang.Thread.run(Thread.java:835)
The proposed fix for this case is to suspend VM while the body of the "for" loop (lines 150 - 356) is executed while ensuring that for line 151 ( line = pipe.printlln()) VM is temporary resumed ( line numbers are given for an original version of test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
147 //------------------------------------------------------ testing section
148 log1(" TESTING BEGINS");
149
150 for (int i = 0; ; i++) {
151 pipe.println("newcheck");
152 line = pipe.readln();
153
<skipped>
356 }
357 log1(" TESTING ENDS");
358
Ok. I get it now. It was confusing because there is a resume before the
suspend, but I see now why it was done that way. I assume you
originally tried just doing the suspend outside of the loop, but found
that caused problems for pipe.readln(), so you moved it to inside the
loop and after the pipe.readln().
Post by Daniil Titov
3. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances002/instances002.java
a) The same issue as above. On line 125 calling disableCollection () on newly created array object (arrayType.newInstance()) resulted in ObjectCollectedException was thrown
The fix here is to suspend VM while the testing code is executed.
b) IndexOutOfBoundException on line 130 (checkDebugeeAnswer_instances(className, createInstanceCount + baseInstances) since some of base instances that were created before VM was suspended become collected if GC was triggered before VM was suspended.
The fix here is to call disableCollection() an all base instances and filter out ones that were already collected.
Ok.
Post by Daniil Titov
4. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java
The problem here was ObjectCollectedException thrown on line 141 while iterating over newly created instances since these instances also include ones created by other java threads and some of them become GC'ed before disableCollection() is called on them. In this test the new instances are created by debuggee itself (the test sends the commands to debuggee to create or delete instances using pipe) so here we cannot use suspend VM approach.
The fix is to filter out already collected instances ( these are the instances created outside of the test) and ensure that instances created by the test are not GC'ed before we call disableCollection() on them. For the latter the fix just extends the approach used by the test for WEAK references ( to ask debuggee to create a temporary strong reference and then delete them after we call disableCollection() ) to other reference types.
Ok.
Post by Daniil Titov
5. Stress tests vmTestbase/nsk/jdi/stress/serial/heapwalking001/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/heapwalking002/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java
1) These tests use primitive arrays as test classes and these classes might be created by other java threads outside the test . Changes in useStrictCheck() fix this.
2) These tests include test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java test that is fixed at 4 above
Ok.

Thanks for describing the separate issues. Can you make sure the CR is
updated with this info?

thanks,

Chris
Post by Daniil Titov
Thanks,
Daniil
Hi Daniil,
Hi Daniil,
I was wondering why instance003 did not include the same fix as
instance002. In fact I was wondering why all the tests did not include
this fix. It looks like you have now fixed instance003 the same way as
instance002, but I'm unsure of how all the other tests are being fixed.
What is the fix now for MonitorEvents002, heapwalking001,
heapwalking002, and mixed002. Are these all covered by the
useStrictCheck() fix?
For the newinstance tests, I understand what you are saying about
needing to resume the VM to read the response, but I'm not sure how this
manifested itself as a problem before the fix and how it relates to the
reported failures, since as far as I can tell you did not add a suspend,
so it must have been in place already.
Maybe what's going on here is that there are actually 3 separate issues,
which is making it hard to figure out what the failure mode is for each
test, and how the changes are addressing it.
thanks,
Chris
Post by Daniil Titov
Hi Chris,
Thank you for your comments.
Excluding of java.lang.String is not really required. It seems as it is sufficient just to expect that the object created by other threads and included in the result of referenceType.instances() and might be collected before we are able to call disableCollection() on them. The new version of the patch includes such change in vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java .
The changes in useStrictCheck() are required since some tests use array of primitives aa a test classes ( e.g. boolean[] ) and objects of these classes might be created by other (e.g. JVMC compiler) threads.
The useStrictCheck() defines how the number of instances the test created should be compared to the actual number of instance of this class in the target VM. If useStrictCheck() returns true these numbers should be equal, otherwise the test passes if the number of the instances the test created is no greater than the number of instances found.
The changes below are required since we need to suspend the target VM while the test is executed (to ensure no GC is triggered). At the same time we need to ensure the target VM is temporary resumed when we read its output from the pipe. These tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance00{1,2,3}.java) are written in a such way that the test code is executed as a body of the loop and on each iteration it reads a response from the target VM and then runs the tests, so on the first iteration vm.resume() is not required at the beginning of the loop. I added comments through the code and also updated existing comments per your suggestions.
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
Please review a new version of the fix with the changes mentioned above.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Hi Danil,
112 debuggee.suspend();
113
114 List<ObjectReference> baseReferences = new LinkedList<>();
115 // We need to ensure that the base references count is not
changed during the test.
116 // The instances of the class could be already created by
other (e.g. JVMCI) threads
117 // and if GC was scheduled before VM was suspended some of
these instances might
118 // become collected.
119 for (ObjectReference objRef : referenceType.instances(0)) {
120 try {
121 objRef.disableCollection();
122 baseReferences.add(objRef);
123 } catch (ObjectCollectedException e) {
124 // skip this reference
125 }
126 }
127 baseInstances = baseReferences.size();
First it is possible for a GC to be triggered even if the debuggee is
not executing any code because of JVMCI threads. So this is why
debuggee.suspend() is needed. Second, even after calling
debuggee.suspend(), it is still possible for a GC to happen since it may
have been triggered before the debuggee.suspend() call, but not yet
completed, and debuggee.suspend() does not prevent GC from completing in
that case. So that is why you need to disableCollecion() on each object,
and accept that some of them may have already been collected. Therefore
baseInstances needs to be updated to only reflect the count of instances
that have not been collected, and will not be collected because
disableCollection() has been called on them. This all makes sense and
seems fine.
I do think the comments could use some updating. The debuggee.suspend()
call should be explained as being needed because there are potentially
other non-test java threads allocating objects and triggering GC's,
JVMCI being the main culprit here. The comment before the loop should
focus on dealing with a GC that was triggered before the suspend,
requiring disableCollection() be called each object returned by
referenceType.instances() since it can potentially be collected
otherwise. This code is not really related to the JVMCI threads.
I don't understand the reason for excluding java.lang.String from testing.
I don't understand the reason for the useStrictCheck() changes.
I don't understand what the following is fixing, and the impact on test
153 if (i > 0) {
154 debuggee.resume();
155 }
156
157 line = pipe.readln();
158 debuggee.suspend();
thanks,
Chris
Post by Daniil Titov
Hi Chris and Serguei,
With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Post by s***@oracle.com
Post by Chris Plummer
Hi Daniil,
I agree with Chris below.
Will tell more on reply to your reply.
Thanks,
Serguei
Post by Chris Plummer
Hi Daniil,
I thought the issue was that C2 was doing metadata allocations, and
when it ran out of metaspace it did a GC (one that was not triggered
by a failed object
Forgot to comment on the below.
It is probably a typo.
Should it be about the Graal, not the C2?
As I understand we have no issue with the C2.
Actually it was the C1 Compiler Thread that was the problem, although it
only turned up during Graal testing.
Chris
Post by s***@oracle.com
Thanks,
Serguei
Post by Chris Plummer
Post by Chris Plummer
allocation). As a results we were getting objects GCs before the
test ever got the chance to disable collection on them. I thought we
also concluded other than this metaspace issue, if the test is
properly disabling collection on the objects it cared about, it
shouldn't matter if there are GC's triggered by excessive object
allocations.
I don't think the following check will always be valid. It may be on
651 public boolean isJVMCICompilerOn() {
652 String opts = argumentHandler.getLaunchOptions();
653 return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
654 }
thanks,
Chris
Post by Daniil Titov
Please review the change that fixes several tests failing with
com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler
threads in the target VM create a lot of objects and sporadically
trigger GC that results in the objects created in the target VM in
the tests being GCed before the tests complete. The other problem
is that for some classes the tests use, e.g. "java.lang.String",
there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from
creating new objects during the tests execution. In cases when
IOPipe is used for communication between the test and the debuggee
the fix suspends all threads but the main rather than suspending
the VM. The fix also filters out the checks for some test classes
(e.g. "java.lang.String") in cases when the target VM is run with
JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
d***@oracle.com
2018-11-09 18:19:26 UTC
Permalink
With OSR, a compile can be initiated in a for loop, so I'd say this is
still not safe if the compile can be blocking, which happens if:

    bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;

So that probably means the test cannot be run with -Xcomp.

dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of the "for" loop (lines 150 - 356) is executed while ensuring that for line 151 ( line = pipe.printlln()) VM is temporary resumed ( line numbers are given for an original version of test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
Chris Plummer
2018-11-09 19:42:26 UTC
Permalink
This sounds like a general issue with debugging and graal. Debuggers
expect to be able to suspend all threads, and then resume just the ones
they want to see executed. You're saying there are situations in which
the resumed thread will end up blocking on the suspended compiler thread(s).

Chris
Post by d***@oracle.com
With OSR, a compile can be initiated in a for loop, so I'd say this is
    bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;
So that probably means the test cannot be run with -Xcomp.
dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of the
"for" loop (lines 150 - 356) is executed while ensuring that for line
151 ( line = pipe.printlln()) VM is temporary resumed ( line numbers
are given for an original version of
test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
s***@oracle.com
2018-11-09 20:14:41 UTC
Permalink
We need to understand/formulate what requirements/limits the Graal
compiler has to follow and then negotiate it with the Compiler team.
I feel it is not a good idea to work around these issues to make the
tests to pass.
Sorry that I've lost track of the discussion - will need to re-read the
email thread thoroughly.

Thanks,
Serguei
Post by Chris Plummer
This sounds like a general issue with debugging and graal. Debuggers
expect to be able to suspend all threads, and then resume just the
ones they want to see executed. You're saying there are situations in
which the resumed thread will end up blocking on the suspended
compiler thread(s).
Chris
Post by d***@oracle.com
With OSR, a compile can be initiated in a for loop, so I'd say this
    bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;
So that probably means the test cannot be run with -Xcomp.
dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of
the "for" loop (lines 150 - 356) is executed while ensuring that for
line 151 ( line = pipe.printlln()) VM is temporary resumed ( line
numbers are given for an original version of
test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
Daniil Titov
2018-11-09 20:26:20 UTC
Permalink
Hi Dean,

The blocking issue for suspended JVMCI compiler thread in case of Graal and -XComp was recently solved in JDK-8195627 " Graal] nsk/jdi/VirtualMachine/redefineClasses/redefineclasses026 hangs with Graal in Xcomp mode" . Could you please say do you think it will not be sufficient for the case you described?

Just in case please find below the links to Jira and review thread for JDK-8195627:
Review thread : http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-October/025764.html
Jira issue: https://bugs.openjdk.java.net/browse/JDK-8195627

Thanks,
Daniil

On 11/9/18, 12:14 PM, "***@oracle.com" <***@oracle.com> wrote:

We need to understand/formulate what requirements/limits the Graal
compiler has to follow and then negotiate it with the Compiler team.
I feel it is not a good idea to work around these issues to make the
tests to pass.
Sorry that I've lost track of the discussion - will need to re-read the
email thread thoroughly.

Thanks,
Serguei
Post by Chris Plummer
This sounds like a general issue with debugging and graal. Debuggers
expect to be able to suspend all threads, and then resume just the
ones they want to see executed. You're saying there are situations in
which the resumed thread will end up blocking on the suspended
compiler thread(s).
Chris
Post by d***@oracle.com
With OSR, a compile can be initiated in a for loop, so I'd say this
bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;
So that probably means the test cannot be run with -Xcomp.
dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of
the "for" loop (lines 150 - 356) is executed while ensuring that for
line 151 ( line = pipe.printlln()) VM is temporary resumed ( line
numbers are given for an original version of
test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
d***@oracle.com
2018-11-09 21:00:24 UTC
Permalink
I see, eventually it should stop waiting and timeout.  That seems fine.

dl
Post by Daniil Titov
Hi Dean,
The blocking issue for suspended JVMCI compiler thread in case of Graal and -XComp was recently solved in JDK-8195627 " Graal] nsk/jdi/VirtualMachine/redefineClasses/redefineclasses026 hangs with Graal in Xcomp mode" . Could you please say do you think it will not be sufficient for the case you described?
Review thread : http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-October/025764.html
Jira issue: https://bugs.openjdk.java.net/browse/JDK-8195627
Thanks,
Daniil
We need to understand/formulate what requirements/limits the Graal
compiler has to follow and then negotiate it with the Compiler team.
I feel it is not a good idea to work around these issues to make the
tests to pass.
Sorry that I've lost track of the discussion - will need to re-read the
email thread thoroughly.
Thanks,
Serguei
Post by Chris Plummer
This sounds like a general issue with debugging and graal. Debuggers
expect to be able to suspend all threads, and then resume just the
ones they want to see executed. You're saying there are situations in
which the resumed thread will end up blocking on the suspended
compiler thread(s).
Chris
Post by d***@oracle.com
With OSR, a compile can be initiated in a for loop, so I'd say this
bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;
So that probably means the test cannot be run with -Xcomp.
dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of
the "for" loop (lines 150 - 356) is executed while ensuring that for
line 151 ( line = pipe.printlln()) VM is temporary resumed ( line
numbers are given for an original version of
test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
s***@oracle.com
2018-11-13 20:05:43 UTC
Permalink
Hi Daniil,

I do not see the latest webrev in this email anymore.
Is it still the v3 version?

If so then the fix looks good to me.
I like the comments you added there as they help.

I was also concerned about the compiler issue.
Thank you for pointing that it was resolved with the JDK-8195627.

Thanks,
Serguei
Post by Daniil Titov
Hi Dean,
The blocking issue for suspended JVMCI compiler thread in case of Graal and -XComp was recently solved in JDK-8195627 " Graal] nsk/jdi/VirtualMachine/redefineClasses/redefineclasses026 hangs with Graal in Xcomp mode" . Could you please say do you think it will not be sufficient for the case you described?
Review thread : http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-October/025764.html
Jira issue: https://bugs.openjdk.java.net/browse/JDK-8195627
Thanks,
Daniil
We need to understand/formulate what requirements/limits the Graal
compiler has to follow and then negotiate it with the Compiler team.
I feel it is not a good idea to work around these issues to make the
tests to pass.
Sorry that I've lost track of the discussion - will need to re-read the
email thread thoroughly.
Thanks,
Serguei
Post by Chris Plummer
This sounds like a general issue with debugging and graal. Debuggers
expect to be able to suspend all threads, and then resume just the
ones they want to see executed. You're saying there are situations in
which the resumed thread will end up blocking on the suspended
compiler thread(s).
Chris
Post by d***@oracle.com
With OSR, a compile can be initiated in a for loop, so I'd say this
bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;
So that probably means the test cannot be run with -Xcomp.
dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of
the "for" loop (lines 150 - 356) is executed while ensuring that for
line 151 ( line = pipe.printlln()) VM is temporary resumed ( line
numbers are given for an original version of
test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
Daniil Titov
2018-11-13 20:50:19 UTC
Permalink
Hi Serguei,

Thank you for reviewing this change. You are right, the latest webrev is still v3.

Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174

Best regards,
Daniil

On 11/13/18, 12:05 PM, "***@oracle.com" <***@oracle.com> wrote:

Hi Daniil,

I do not see the latest webrev in this email anymore.
Is it still the v3 version?

If so then the fix looks good to me.
I like the comments you added there as they help.

I was also concerned about the compiler issue.
Thank you for pointing that it was resolved with the JDK-8195627.

Thanks,
Serguei
Post by Daniil Titov
Hi Dean,
The blocking issue for suspended JVMCI compiler thread in case of Graal and -XComp was recently solved in JDK-8195627 " Graal] nsk/jdi/VirtualMachine/redefineClasses/redefineclasses026 hangs with Graal in Xcomp mode" . Could you please say do you think it will not be sufficient for the case you described?
Review thread : http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-October/025764.html
Jira issue: https://bugs.openjdk.java.net/browse/JDK-8195627
Thanks,
Daniil
We need to understand/formulate what requirements/limits the Graal
compiler has to follow and then negotiate it with the Compiler team.
I feel it is not a good idea to work around these issues to make the
tests to pass.
Sorry that I've lost track of the discussion - will need to re-read the
email thread thoroughly.
Thanks,
Serguei
Post by Chris Plummer
This sounds like a general issue with debugging and graal. Debuggers
expect to be able to suspend all threads, and then resume just the
ones they want to see executed. You're saying there are situations in
which the resumed thread will end up blocking on the suspended
compiler thread(s).
Chris
Post by d***@oracle.com
With OSR, a compile can be initiated in a for loop, so I'd say this
bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;
So that probably means the test cannot be run with -Xcomp.
dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of
the "for" loop (lines 150 - 356) is executed while ensuring that for
line 151 ( line = pipe.printlln()) VM is temporary resumed ( line
numbers are given for an original version of
test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
s***@oracle.com
2018-11-13 20:53:51 UTC
Permalink
Then consider it reviewed.

Thanks!
Serguei
Post by Daniil Titov
Hi Serguei,
Thank you for reviewing this change. You are right, the latest webrev is still v3.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Best regards,
Daniil
Hi Daniil,
I do not see the latest webrev in this email anymore.
Is it still the v3 version?
If so then the fix looks good to me.
I like the comments you added there as they help.
I was also concerned about the compiler issue.
Thank you for pointing that it was resolved with the JDK-8195627.
Thanks,
Serguei
Post by Daniil Titov
Hi Dean,
The blocking issue for suspended JVMCI compiler thread in case of Graal and -XComp was recently solved in JDK-8195627 " Graal] nsk/jdi/VirtualMachine/redefineClasses/redefineclasses026 hangs with Graal in Xcomp mode" . Could you please say do you think it will not be sufficient for the case you described?
Review thread : http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-October/025764.html
Jira issue: https://bugs.openjdk.java.net/browse/JDK-8195627
Thanks,
Daniil
We need to understand/formulate what requirements/limits the Graal
compiler has to follow and then negotiate it with the Compiler team.
I feel it is not a good idea to work around these issues to make the
tests to pass.
Sorry that I've lost track of the discussion - will need to re-read the
email thread thoroughly.
Thanks,
Serguei
Post by Chris Plummer
This sounds like a general issue with debugging and graal. Debuggers
expect to be able to suspend all threads, and then resume just the
ones they want to see executed. You're saying there are situations in
which the resumed thread will end up blocking on the suspended
compiler thread(s).
Chris
Post by d***@oracle.com
With OSR, a compile can be initiated in a for loop, so I'd say this
bool is_blocking = !directive->BackgroundCompilationOption ||
CompileTheWorld || ReplayCompiles;
So that probably means the test cannot be run with -Xcomp.
dl
Post by Daniil Titov
The proposed fix for this case is to suspend VM while the body of
the "for" loop (lines 150 - 356) is executed while ensuring that for
line 151 ( line = pipe.printlln()) VM is temporary resumed ( line
numbers are given for an original version of
test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
d***@oracle.com
2018-11-03 07:16:39 UTC
Permalink
I don't think suspending compiler threads is safe, especially if
compiles are blocking/synchronous.

dl
Post by Daniil Titov
Please review the change that fixes several tests failing with com.sun.jdi.ObjectCollectedException when running with Graal.
There main problem here is that with Graal on JVMCI Compiler threads in the target VM create a lot of objects and sporadically trigger GC that results in the objects created in the target VM in the tests being GCed before the tests complete. The other problem is that for some classes the tests use, e.g. "java.lang.String", there is already a huge number of instances created by JVMCI threads.
The fix suspends target VM to prevent JVMCI compiler threads from creating new objects during the tests execution. In cases when IOPipe is used for communication between the test and the debuggee the fix suspends all threads but the main rather than suspending the VM. The fix also filters out the checks for some test classes (e.g. "java.lang.String") in cases when the target VM is run with JVMCI compiler options on.
Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
Thanks,
Daniil
Loading...