Discussion:
Debugging GC crash
Elias Naur
2006-08-30 22:02:00 UTC
Permalink
Hi,

Running a batch program involving a lot of XML parsing I got a GC related
crash reproduced below. The trace itself is probably not useful for tracking
down the problem, so I'd like to know if there's some generic hints flags to
help track this down. The RVM is built with the 'prototype' configuration,
but some simpler configuration might exist.

I've tried -X:gc:sanityCheck=true which gives some "ERROR" lines before the
internal error crash (see below). The errors on the "los" (Large Object
Space?) space is followed by a huge amount of similar errors in the boot
space at the next sanity check round.

I've also tried BaseBaseMarkSweep and BaseBaseGenCopy giving me the same
errors with -X:gc:sanityCheck=true.

(The first line is the last line of -verbose:gc output, showing that the crash
happened at a "full heap" GC)

[Full heap][GC 163 Start 69.96 s 31100KB getObjectType: objRef = 0x5f3443b4
tib = 0x5e3a243c
tib's type is not Object[]
vm internal error at:
-- Stack --
Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1079
Lcom/ibm/JikesRVM/VM;
_assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 577
Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V
at line 558
Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 538
Lcom/ibm/JikesRVM/mm/mmtk/ObjectModel;
getObjectType(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/mmtk/utility/scan/MMType;
at line 393
Lorg/mmtk/utility/scan/Scan;
scanObject(Lorg/mmtk/plan/TraceLocal;Lorg/vmmagic/unboxed/ObjectReference;)V
at line 33
Lorg/mmtk/plan/TraceLocal;
scanObject(Lorg/vmmagic/unboxed/ObjectReference;)V at line 148
Lorg/mmtk/plan/TraceLocal; completeTrace()V at line 469
Lorg/mmtk/plan/TraceLocal; startTrace()V at line 455
Lorg/mmtk/plan/generational/marksweep/GenMSCollector;
collectionPhase(IZ)V at line 131
Lorg/mmtk/plan/SimplePhase; delegatePhase()V at line 122
Lorg/mmtk/plan/Phase; delegatePhase(Lorg/mmtk/plan/Phase;)V at line
155
Lorg/mmtk/plan/Phase; delegatePhase(I)V at line 141
Lorg/mmtk/plan/ComplexPhase; delegatePhase()V at line 95
Lorg/mmtk/plan/Phase; delegatePhase(Lorg/mmtk/plan/Phase;)V at line
155
Lorg/mmtk/plan/Phase; delegatePhase(I)V at line 141
Lorg/mmtk/plan/ComplexPhase; delegatePhase()V at line 95
Lorg/mmtk/plan/Phase; delegatePhase(Lorg/mmtk/plan/Phase;)V at line
155
Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 54
Lcom/ibm/JikesRVM/memoryManagers/mmInterface/VM_CollectorThread;
run()V at line 341
Lcom/ibm/JikesRVM/VM_Thread; startoff()V at line 781
Java Result: 124

The errors from -X:gc:sanityCheck=true

[java] [Full heap][GC 13 Start 31.90 s 20280KB
[java] ============================== GC Sanity Checking
==============================
[java] Performing Sanity Checks...
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b01c018 [los] [B
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b060018 [los] [B
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a4018 [los]
[Ljava/util/HashMap$HashEntry;
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c3018 [los]
[Ljava/util/HashMap$HashEntry;
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b0ce018 [los] [B
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b112018 [los] [B
[java] ERROR: SanityRC = 2, SpaceRC = 0 0x5b181018 [los]
[Lcom/ibm/JikesRVM/VM_Code;
[java] ERROR: SanityRC = 0, SpaceRC = 0 0x5b19a018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b1b1018 [los]
[Ljava/util/HashMap$HashEntry;
[java] roots objects refs null
[java] 11604 377816 1172075 335836
[java]
================================================================================

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Eliot Moss
2006-08-31 00:18:01 UTC
Permalink
Apparently some reachable byte arrays are being reclaimed.

Given that sanity check uses the same GC maps as GC does (for finding
pointers in the stack), it is presumably not a bad GC map that is the
problem. Rather, it is a write barrier or a failure in the GC itself.

Perhaps someone who has maintained or updated these collectors recently can
debug this or offer more guidance.

-- Eliot

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-08-31 00:30:09 UTC
Permalink
Thanks very much for this.

If you can give me instructions on exactly how to reproduce it (with a
benchmark I will have access to, for example), that would be great.
Given that it shows up on a BaseBase build, it should be a relatively
quick one to nail. Eliot's synopsis seems right, though write barriers
are presumably not the issue given that MarkSweep shows the problem.
How many processors do you use? If you're not sure, try reproducing it
with -X:processors=1 (this should remove GC-time concurrency as a likely
source of the problem).

--Steve
Post by Elias Naur
Hi,
Running a batch program involving a lot of XML parsing I got a GC related
crash reproduced below. The trace itself is probably not useful for tracking
down the problem, so I'd like to know if there's some generic hints flags to
help track this down. The RVM is built with the 'prototype' configuration,
but some simpler configuration might exist.
I've tried -X:gc:sanityCheck=true which gives some "ERROR" lines before the
internal error crash (see below). The errors on the "los" (Large Object
Space?) space is followed by a huge amount of similar errors in the boot
space at the next sanity check round.
I've also tried BaseBaseMarkSweep and BaseBaseGenCopy giving me the same
errors with -X:gc:sanityCheck=true.
(The first line is the last line of -verbose:gc output, showing that the crash
happened at a "full heap" GC)
[Full heap][GC 163 Start 69.96 s 31100KB getObjectType: objRef = 0x5f3443b4
tib = 0x5e3a243c
tib's type is not Object[]
-- Stack --
Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1079
Lcom/ibm/JikesRVM/VM;
_assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 577
Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V
at line 558
Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 538
Lcom/ibm/JikesRVM/mm/mmtk/ObjectModel;
getObjectType(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/mmtk/utility/scan/MMType;
at line 393
Lorg/mmtk/utility/scan/Scan;
scanObject(Lorg/mmtk/plan/TraceLocal;Lorg/vmmagic/unboxed/ObjectReference;)V
at line 33
Lorg/mmtk/plan/TraceLocal;
scanObject(Lorg/vmmagic/unboxed/ObjectReference;)V at line 148
Lorg/mmtk/plan/TraceLocal; completeTrace()V at line 469
Lorg/mmtk/plan/TraceLocal; startTrace()V at line 455
Lorg/mmtk/plan/generational/marksweep/GenMSCollector;
collectionPhase(IZ)V at line 131
Lorg/mmtk/plan/SimplePhase; delegatePhase()V at line 122
Lorg/mmtk/plan/Phase; delegatePhase(Lorg/mmtk/plan/Phase;)V at line
155
Lorg/mmtk/plan/Phase; delegatePhase(I)V at line 141
Lorg/mmtk/plan/ComplexPhase; delegatePhase()V at line 95
Lorg/mmtk/plan/Phase; delegatePhase(Lorg/mmtk/plan/Phase;)V at line
155
Lorg/mmtk/plan/Phase; delegatePhase(I)V at line 141
Lorg/mmtk/plan/ComplexPhase; delegatePhase()V at line 95
Lorg/mmtk/plan/Phase; delegatePhase(Lorg/mmtk/plan/Phase;)V at line
155
Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 54
Lcom/ibm/JikesRVM/memoryManagers/mmInterface/VM_CollectorThread;
run()V at line 341
Lcom/ibm/JikesRVM/VM_Thread; startoff()V at line 781
Java Result: 124
The errors from -X:gc:sanityCheck=true
[java] [Full heap][GC 13 Start 31.90 s 20280KB
[java] ============================== GC Sanity Checking
==============================
[java] Performing Sanity Checks...
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b01c018 [los] [B
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b060018 [los] [B
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a4018 [los]
[Ljava/util/HashMap$HashEntry;
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c3018 [los]
[Ljava/util/HashMap$HashEntry;
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b0ce018 [los] [B
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b112018 [los] [B
[java] ERROR: SanityRC = 2, SpaceRC = 0 0x5b181018 [los]
[Lcom/ibm/JikesRVM/VM_Code;
[java] ERROR: SanityRC = 0, SpaceRC = 0 0x5b19a018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
[java] ERROR: SanityRC = 1, SpaceRC = 0 0x5b1b1018 [los]
[Ljava/util/HashMap$HashEntry;
[java] roots objects refs null
[java] 11604 377816 1172075 335836
[java]
================================================================================
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
--
--Steve

Research Fellow, Australian National University
phone: +61 2 6125 4821 fax: +61 2 6125 0010
http://cs.anu.edu.au/~Steve.Blackburn


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-08-31 10:19:51 UTC
Permalink
Post by Steve Blackburn
Thanks very much for this.
If you can give me instructions on exactly how to reproduce it (with a
benchmark I will have access to, for example), that would be great.
Given that it shows up on a BaseBase build, it should be a relatively
quick one to nail. Eliot's synopsis seems right, though write barriers
are presumably not the issue given that MarkSweep shows the problem.
How many processors do you use? If you're not sure, try reproducing it
with -X:processors=1 (this should remove GC-time concurrency as a likely
source of the problem).
--Steve
Ok, you probably got a much better chance of figuring this out than me, so
here goes:

http://sourceforge.net/tracker/index.php?func=detail&aid=1549822&group_id=128805&atid=712768

I re-ran the test with -X:processors=1 which didn't make a difference (and
I've only got one processor in this machine anyway). Don't hesitate to write
if you need more information - I'm eager to get jikes rvm to run all my java
stuff.

- elias

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-08-31 22:54:26 UTC
Permalink
OK. Your bug report looks good. I'll attack it ASAP. It may only take
a few hours, but then again, GC bugs have a way of not turning out that
way... :-)

--Steve
Post by Elias Naur
Post by Steve Blackburn
Thanks very much for this.
If you can give me instructions on exactly how to reproduce it (with a
benchmark I will have access to, for example), that would be great.
Given that it shows up on a BaseBase build, it should be a relatively
quick one to nail. Eliot's synopsis seems right, though write barriers
are presumably not the issue given that MarkSweep shows the problem.
How many processors do you use? If you're not sure, try reproducing it
with -X:processors=1 (this should remove GC-time concurrency as a likely
source of the problem).
--Steve
Ok, you probably got a much better chance of figuring this out than me, so
http://sourceforge.net/tracker/index.php?func=detail&aid=1549822&group_id=128805&atid=712768
I re-ran the test with -X:processors=1 which didn't make a difference (and
I've only got one processor in this machine anyway). Don't hesitate to write
if you need more information - I'm eager to get jikes rvm to run all my java
stuff.
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
--
--Steve

Research Fellow, Australian National University
phone: +61 2 6125 4821 fax: +61 2 6125 0010
http://cs.anu.edu.au/~Steve.Blackburn


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-09-06 02:12:44 UTC
Permalink
Hi Elias,

I have finally had a chance to look at your problem. Unfortunately I
can't reproduce it. Your test runs without any problem on the current
svn head in my environment (which is a default debian environment, using
an unchanged config file).

You're going to have to tell us more.

I note that in your bugtrack record, you now say the system crashes even
with a trivial main which just includes a call to system.gc(). That
sounds dire indeed. Have you modified your system in any way? Can you
test with an unmodified system straight from svn? Can you perform any
of the regular regressions? My feeling is that there is something
fundamentally wrong, something which somehow relates to your environment.

If I can reproduce your bug I'll be very glad to nail it.

--Steve
Post by Elias Naur
Post by Steve Blackburn
Thanks very much for this.
If you can give me instructions on exactly how to reproduce it (with a
benchmark I will have access to, for example), that would be great.
Given that it shows up on a BaseBase build, it should be a relatively
quick one to nail. Eliot's synopsis seems right, though write barriers
are presumably not the issue given that MarkSweep shows the problem.
How many processors do you use? If you're not sure, try reproducing it
with -X:processors=1 (this should remove GC-time concurrency as a likely
source of the problem).
--Steve
Ok, you probably got a much better chance of figuring this out than me, so
http://sourceforge.net/tracker/index.php?func=detail&aid=1549822&group_id=128805&atid=712768
I re-ran the test with -X:processors=1 which didn't make a difference (and
I've only got one processor in this machine anyway). Don't hesitate to write
if you need more information - I'm eager to get jikes rvm to run all my java
stuff.
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
--
--Steve

Research Fellow, Australian National University
phone: +61 2 6125 4821 fax: +61 2 6125 0010
http://cs.anu.edu.au/~Steve.Blackburn


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Eliot Moss
2006-09-06 02:20:41 UTC
Permalink
I winder if Elias's system loads a .so file at a conflicting address?

-- Eliot

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-09-06 02:17:35 UTC
Permalink
Elias,

A long shot... ...perhaps Jikes RVM and your ld.so are not playing
together well, this may be particularly true if you're loading in other
libraries.

Can you use ldd to find out where your libraries are being loaded, and
use -X:gc:verbose=3 to see (right at the start of the boot) how MMTk is
carving up virtual memory? If there's a clash, then you may see bizarre
behavior like this.

--Steve
Post by Elias Naur
Post by Steve Blackburn
Thanks very much for this.
If you can give me instructions on exactly how to reproduce it (with a
benchmark I will have access to, for example), that would be great.
Given that it shows up on a BaseBase build, it should be a relatively
quick one to nail. Eliot's synopsis seems right, though write barriers
are presumably not the issue given that MarkSweep shows the problem.
How many processors do you use? If you're not sure, try reproducing it
with -X:processors=1 (this should remove GC-time concurrency as a likely
source of the problem).
--Steve
Ok, you probably got a much better chance of figuring this out than me, so
http://sourceforge.net/tracker/index.php?func=detail&aid=1549822&group_id=128805&atid=712768
I re-ran the test with -X:processors=1 which didn't make a difference (and
I've only got one processor in this machine anyway). Don't hesitate to write
if you need more information - I'm eager to get jikes rvm to run all my java
stuff.
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
--
--Steve

Research Fellow, Australian National University
phone: +61 2 6125 4821 fax: +61 2 6125 0010
http://cs.anu.edu.au/~Steve.Blackburn


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-09-06 08:02:19 UTC
Permalink
Post by Steve Blackburn
Elias,
A long shot... ...perhaps Jikes RVM and your ld.so are not playing
together well, this may be particularly true if you're loading in other
libraries.
Can you use ldd to find out where your libraries are being loaded, and
use -X:gc:verbose=3 to see (right at the start of the boot) how MMTk is
carving up virtual memory? If there's a clash, then you may see bizarre
behavior like this.
--Steve
This is unfortunate. I've looked at the addresses and they don't seem to clash
with anything (aren't library code marked read-only so that I'd get an access
violation of the mappings overlapped?). I'll look into this some more, but
here's some more information:

System specs:
SUSE 10.1
Intel Pentium M
uname: Linux ip173 2.6.16.21-0.13-default #1 Mon Jul 17 17:22:44 UTC 2006 i686
i686 i386 GNU/Linux

Here's a sample output of a trivial System.gc() test from -X:gc:verbose=3
-X:gc:fullHeapSystemGC=true -X:processors=1 -X:gc:sanityCheck=true with RVM
built from a clean checkout (with an almost clean config - only the
HOST_JAVA_HOME is changed to point at the right location for the SUN JDK):

boot 0x47000000->0x56ffffff
immortal 0x57000000->0x58ffffff
meta 0x59000000->0x5affffff
los 0x5b000000->0x5dffffff
plos 0xa9000000->0xafffffff
nursery 0x9a000000->0xa8ffffff
ms 0x5e000000->0x8fbfffff
sanity 0x8fc00000->0x91bfffff
Collection sanity checking enabled.
Collection triggered due to resource exhaustion
Collection 1: reserved = 22 MB (5634 pgs) total = 20 MB (5120 pgs)
Before Collection: used = 12.57 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.07 Mb + los 0.80 Mb + plos 2.12 Mb + nursery 9.43 Mb + ms 0.00 Mb + sanity
0.00 Mb

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10343 320975 998502 290950
================================================================================

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10343 321575 999222 291070
================================================================================
After Collection: used = 7.32 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.00 Mb + los 0.80 Mb + plos 2.10 Mb + nursery 0.00 Mb + ms 4.28 Mb + sanity
0.00 Mb
reserved = 7 MB (1874 pgs) total = 20 MB (5120 pgs)
Collection time: 2488.18 seconds
Collection finished (ms): 2495.66
Collection triggered due to external request
[Full heap]Collection 2: reserved = 9 MB (2359 pgs) total = 20 MB
(5120 pgs)
Before Collection: used = 8.71 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.01 Mb + los 1.33 Mb + plos 2.46 Mb + nursery 0.50 Mb + ms 4.28 Mb + sanity
0.00 Mb

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x5b004018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b01c018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b060018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a4018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x5b0af018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c3018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0ce018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b112018 [los] [B
roots objects refs null
10387 323567 1007190 292919
================================================================================


and here's a the /proc/*/maps for the JikesRVM process (I stopped it right
after the ERROR: printouts):

***@ip173:~> cat /proc/4634/maps
08048000-0805e000 r-xp 00000000 03:06
2679523 /home/elias/svn/rvmroot/build/JikesRVM
0805e000-08060000 rwxp 00016000 03:06
2679523 /home/elias/svn/rvmroot/build/JikesRVM
08060000-08081000 rwxp 08060000 00:00 0 [heap]
47000000-47c69000 rwxp 00000000 03:06
2679426 /home/elias/svn/rvmroot/build/RVM.data.image
4b000000-4b2a0000 rwxp 00000000 03:06
2679487 /home/elias/svn/rvmroot/build/RVM.code.image
4e000000-4e047000 rwxp 00000000 03:06
2679488 /home/elias/svn/rvmroot/build/RVM.rmap.image
57000000-57100000 rwxp 57000000 00:00 0
59000000-59100000 rwxp 59000000 00:00 0
5b000000-5b200000 rwxp 5b000000 00:00 0
5e000000-5e500000 rwxp 5e000000 00:00 0
8fc00000-90d00000 rwxp 8fc00000 00:00 0
9a000000-9aa00000 rwxp 9a000000 00:00 0
a9000000-a9300000 rwxp a9000000 00:00 0
b74df000-b74e1000 r-xp 00000000 03:06
2679515 /home/elias/svn/rvmroot/build/libjpnexec.so
b74e1000-b74e2000 rwxp 00001000 03:06
2679515 /home/elias/svn/rvmroot/build/libjpnexec.so
b74e2000-b74e3000 ---p b74e2000 00:00 0
b74e3000-b7ce4000 rwxp b74e3000 00:00 0
b7ce4000-b7dfd000 r-xp 00000000 03:06 106387 /lib/libc-2.4.so
b7dfd000-b7dff000 r-xp 00118000 03:06 106387 /lib/libc-2.4.so
b7dff000-b7e01000 rwxp 0011a000 03:06 106387 /lib/libc-2.4.so
b7e01000-b7e04000 rwxp b7e01000 00:00 0
b7e04000-b7e0e000 r-xp 00000000 03:06 152436 /lib/libgcc_s.so.1
b7e0e000-b7e0f000 rwxp 00009000 03:06 152436 /lib/libgcc_s.so.1
b7e0f000-b7e32000 r-xp 00000000 03:06 151189 /lib/libm-2.4.so
b7e32000-b7e34000 rwxp 00022000 03:06 151189 /lib/libm-2.4.so
b7e34000-b7e35000 rwxp b7e34000 00:00 0
b7e35000-b7f0a000 r-xp 00000000 03:06 15308 /usr/lib/libstdc++.so.6.0.8
b7f0a000-b7f0d000 r-xp 000d5000 03:06 15308 /usr/lib/libstdc++.so.6.0.8
b7f0d000-b7f0f000 rwxp 000d8000 03:06 15308 /usr/lib/libstdc++.so.6.0.8
b7f0f000-b7f15000 rwxp b7f0f000 00:00 0
b7f15000-b7f25000 r-xp 00000000 03:06 1093267 /lib/libpthread-2.4.so
b7f25000-b7f27000 rwxp 0000f000 03:06 1093267 /lib/libpthread-2.4.so
b7f27000-b7f29000 rwxp b7f27000 00:00 0
b7f29000-b7f2b000 r-xp 00000000 03:06 126119 /lib/libdl-2.4.so
b7f2b000-b7f2d000 rwxp 00001000 03:06 126119 /lib/libdl-2.4.so
b7f2e000-b7f35000 r-xp 00000000 03:06
2679493 /home/elias/svn/rvmroot/build/libjavanio.so
b7f35000-b7f36000 rwxp 00006000 03:06
2679493 /home/elias/svn/rvmroot/build/libjavanio.so
b7f36000-b7f3a000 r-xp 00000000 03:06
2655794 /home/elias/svn/rvmroot/build/libjavaio.so
b7f3a000-b7f3b000 rwxp 00003000 03:06
2655794 /home/elias/svn/rvmroot/build/libjavaio.so
b7f3b000-b7f4a000 r-xp 00000000 03:06
2679490 /home/elias/svn/rvmroot/build/libjavalang.so
b7f4a000-b7f4b000 rwxp 0000e000 03:06
2679490 /home/elias/svn/rvmroot/build/libjavalang.so
b7f4b000-b7f4d000 r-xp 00000000 03:06
2679516 /home/elias/svn/rvmroot/build/libsyswrap.so
b7f4d000-b7f4e000 rwxp 00001000 03:06
2679516 /home/elias/svn/rvmroot/build/libsyswrap.so
b7f4e000-b7f4f000 rwxp b7f4e000 00:00 0
b7f4f000-b7f69000 r-xp 00000000 03:06 75310 /lib/ld-2.4.so
b7f69000-b7f6b000 rwxp 00019000 03:06 75310 /lib/ld-2.4.so
bf94e000-bf965000 rwxp bf94e000 00:00 0 [stack]
ffffe000-fffff000 ---p 00000000 00:00 0 [vdso]

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-09-06 08:57:56 UTC
Permalink
Post by Elias Naur
This is unfortunate. I've looked at the addresses and they don't seem to
clash with anything (aren't library code marked read-only so that I'd get
an access violation of the mappings overlapped?). I'll look into this some
SUSE 10.1
Intel Pentium M
uname: Linux ip173 2.6.16.21-0.13-default #1 Mon Jul 17 17:22:44 UTC 2006
i686 i686 i386 GNU/Linux
Ok, I just tried the same test on a campus terminal with the specs:

Ubuntu 6.06
Pentium 4
Linux lucia 2.6.15-26-686 #1 SMP PREEMPT Thu Aug 3 03:13:28 UTC 2006 i686
GNU/Linux

And the test still fails with roughly the same output:

(***@lucia) ~/jikesgc> ./run.sh
boot 0x47000000->0x56ffffff
immortal 0x57000000->0x58ffffff
meta 0x59000000->0x5affffff
los 0x5b000000->0x5dffffff
plos 0xa9000000->0xafffffff
nursery 0x9a000000->0xa8ffffff
ms 0x5e000000->0x8fbfffff
sanity 0x8fc00000->0x91bfffff
Collection sanity checking enabled.
Collection triggered due to resource exhaustion
Collection 1: reserved = 22 MB (5634 pgs) total = 20 MB (5120 pgs)
Before Collection: used = 12.57 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.07 Mb + los 0.80 Mb + plos 2.12 Mb + nursery 9.43 Mb + ms 0.00 Mb + sanity
0.00 Mb

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 320975 998503 290951
================================================================================

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 321575 999223 291071
================================================================================
After Collection: used = 7.32 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.00 Mb + los 0.80 Mb + plos 2.10 Mb + nursery 0.00 Mb + ms 4.28 Mb + sanity
0.00 Mb
reserved = 7 MB (1874 pgs) total = 20 MB (5120 pgs)
Collection time: 2864.14 seconds
Collection finished (ms): 2876.00
Collection triggered due to external request
[Full heap]Collection 2: reserved = 12 MB (3155 pgs) total = 20 MB
(5120 pgs)
Before Collection: used = 10.73 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.01 Mb + los 1.41 Mb + plos 3.29 Mb + nursery 1.59 Mb + ms 4.28 Mb + sanity
0.00 Mb

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x5b004018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b01c018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b060018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a4018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c3018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x5b0ce018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0e3018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b127018 [los] [B
roots objects refs null
10620 328679 1022306 296197
================================================================================

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 0, SpaceRC = 0 0x4700000c [boot]
Lcom/ibm/JikesRVM/VM_BootRecord;
ERROR: SanityRC = 845, SpaceRC = 0 0x4b00000c [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 752, SpaceRC = 0 0x4b000074 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 865, SpaceRC = 0 0x4b0000e8 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 873, SpaceRC = 0 0x4b000140 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
............ (it continues like that for pages) ...................

- elias

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-09-06 09:21:52 UTC
Permalink
Hmmm... this is not making much sense.

You say in your updated bug report that simply compiling:

public class Test {
public static void main(String[] args) {
System.gc();
}
}

and executing it against a MarkSweep build will fail. I could not
repeat this. If this trivial example were to fail, I think many if not
most of our nightly regressions would be failing, so if I understand
what you're saying correctly, I think there must be something unusual
about your environment.

Are you able to run any regression tests? Can you run SPECjvm98, for
example (this class system.gc() at the start and end of each iteration)?

I noticed that you are not using a standard build config. Can you
verify that this bug arises without the modifications you made to build
config (eg your changes to CC etc etc)? You should be able to build on
your Ubuntu platform with the default config (perhaps JAVA_HOME may need
to change).

I'd really like to get this nailed, but I can't reproduce it...

--Steve
Post by Elias Naur
Post by Elias Naur
This is unfortunate. I've looked at the addresses and they don't seem to
clash with anything (aren't library code marked read-only so that I'd get
an access violation of the mappings overlapped?). I'll look into this some
SUSE 10.1
Intel Pentium M
uname: Linux ip173 2.6.16.21-0.13-default #1 Mon Jul 17 17:22:44 UTC 2006
i686 i686 i386 GNU/Linux
Ubuntu 6.06
Pentium 4
Linux lucia 2.6.15-26-686 #1 SMP PREEMPT Thu Aug 3 03:13:28 UTC 2006 i686
GNU/Linux
boot 0x47000000->0x56ffffff
immortal 0x57000000->0x58ffffff
meta 0x59000000->0x5affffff
los 0x5b000000->0x5dffffff
plos 0xa9000000->0xafffffff
nursery 0x9a000000->0xa8ffffff
ms 0x5e000000->0x8fbfffff
sanity 0x8fc00000->0x91bfffff
Collection sanity checking enabled.
Collection triggered due to resource exhaustion
Collection 1: reserved = 22 MB (5634 pgs) total = 20 MB (5120 pgs)
Before Collection: used = 12.57 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.07 Mb + los 0.80 Mb + plos 2.12 Mb + nursery 9.43 Mb + ms 0.00 Mb + sanity
0.00 Mb
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 320975 998503 290951
================================================================================
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 321575 999223 291071
================================================================================
After Collection: used = 7.32 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.00 Mb + los 0.80 Mb + plos 2.10 Mb + nursery 0.00 Mb + ms 4.28 Mb + sanity
0.00 Mb
reserved = 7 MB (1874 pgs) total = 20 MB (5120 pgs)
Collection time: 2864.14 seconds
Collection finished (ms): 2876.00
Collection triggered due to external request
[Full heap]Collection 2: reserved = 12 MB (3155 pgs) total = 20 MB
(5120 pgs)
Before Collection: used = 10.73 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.01 Mb + los 1.41 Mb + plos 3.29 Mb + nursery 1.59 Mb + ms 4.28 Mb + sanity
0.00 Mb
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x5b004018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b01c018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b060018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a4018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c3018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x5b0ce018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0e3018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b127018 [los] [B
roots objects refs null
10620 328679 1022306 296197
================================================================================
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 0, SpaceRC = 0 0x4700000c [boot]
Lcom/ibm/JikesRVM/VM_BootRecord;
ERROR: SanityRC = 845, SpaceRC = 0 0x4b00000c [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 752, SpaceRC = 0 0x4b000074 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 865, SpaceRC = 0 0x4b0000e8 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 873, SpaceRC = 0 0x4b000140 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
............ (it continues like that for pages) ...................
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-09-06 10:14:46 UTC
Permalink
Post by Steve Blackburn
Hmmm... this is not making much sense.
public class Test {
public static void main(String[] args) {
System.gc();
}
}
and executing it against a MarkSweep build will fail. I could not
repeat this. If this trivial example were to fail, I think many if not
most of our nightly regressions would be failing, so if I understand
what you're saying correctly, I think there must be something unusual
about your environment.
Are you able to run any regression tests? Can you run SPECjvm98, for
example (this class system.gc() at the start and end of each iteration)?
I noticed that you are not using a standard build config. Can you
verify that this bug arises without the modifications you made to build
config (eg your changes to CC etc etc)? You should be able to build on
your Ubuntu platform with the default config (perhaps JAVA_HOME may need
to change).
I'd really like to get this nailed, but I can't reproduce it...
--Steve
Are you running on an intel machine with -X:gc:sanityCheck=true and
-X:gc:fullHeapSystemSystemGC=true? Here's the output from a BaseBaseMarkSweep
configuration with only JAVAM_HOME changed in the i686 config file on the
Ubuntu machine (totally unrelated to my own SUSE machine that exhibits the
same behaviour):

(***@lucia) ~/jikesgc> cat Test.java
public class Test {
public static void main(String[] args) {
System.gc();
}
}
(***@lucia) ~/jikesgc> javac *.java && rvm -X:gc:verbose=3
-X:gc:fullHeapSystemGC=true -X:processors=1 -X:gc:sanityCheck=true Test
boot 0x47000000->0x56ffffff
immortal 0x57000000->0x58ffffff
meta 0x59000000->0x5affffff
los 0x5b000000->0x5dffffff
plos 0xa9000000->0xafffffff
ms 0x5e000000->0x99bfffff
sanity 0x99c00000->0x9bbfffff
Collection sanity checking enabled.
Collection triggered due to external request
Collection 1: reserved = 16 MB (4314 pgs) total = 20 MB (5120 pgs)
Before Collection: used = 16.85 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.00 Mb + los 1.41 Mb + plos 3.30 Mb + ms 12.00 Mb + sanity 0.00 Mb

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x5b000018 [los] [Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b003018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b01b018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b05f018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a3018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c2018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x5b0cd018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0e2018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b126018 [los] [B
roots objects refs null
10543 326352 1014702 295093
================================================================================
Post by Steve Blackburn
Post by Elias Naur
Post by Elias Naur
This is unfortunate. I've looked at the addresses and they don't seem to
clash with anything (aren't library code marked read-only so that I'd
get an access violation of the mappings overlapped?). I'll look into
SUSE 10.1
Intel Pentium M
uname: Linux ip173 2.6.16.21-0.13-default #1 Mon Jul 17 17:22:44 UTC
2006 i686 i686 i386 GNU/Linux
Ubuntu 6.06
Pentium 4
Linux lucia 2.6.15-26-686 #1 SMP PREEMPT Thu Aug 3 03:13:28 UTC 2006 i686
GNU/Linux
boot 0x47000000->0x56ffffff
immortal 0x57000000->0x58ffffff
meta 0x59000000->0x5affffff
los 0x5b000000->0x5dffffff
plos 0xa9000000->0xafffffff
nursery 0x9a000000->0xa8ffffff
ms 0x5e000000->0x8fbfffff
sanity 0x8fc00000->0x91bfffff
Collection sanity checking enabled.
Collection triggered due to resource exhaustion
Collection 1: reserved = 22 MB (5634 pgs) total = 20 MB (5120
pgs) Before Collection: used = 12.57 Mb = boot 0.00 Mb + immortal 0.12 Mb
+ meta 0.07 Mb + los 0.80 Mb + plos 2.12 Mb + nursery 9.43 Mb + ms 0.00
Mb + sanity 0.00 Mb
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 320975 998503 290951
=========================================================================
=======
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 321575 999223 291071
=========================================================================
======= After Collection: used = 7.32 Mb = boot 0.00 Mb + immortal 0.12 Mb
+ meta 0.00 Mb + los 0.80 Mb + plos 2.10 Mb + nursery 0.00 Mb + ms 4.28
Mb + sanity 0.00 Mb
reserved = 7 MB (1874 pgs) total = 20 MB (5120
pgs) Collection time: 2864.14 seconds
Collection finished (ms): 2876.00
Collection triggered due to external request
[Full heap]Collection 2: reserved = 12 MB (3155 pgs) total =
20 MB (5120 pgs)
Before Collection: used = 10.73 Mb = boot 0.00 Mb + immortal 0.12 Mb +
meta 0.01 Mb + los 1.41 Mb + plos 3.29 Mb + nursery 1.59 Mb + ms 4.28 Mb
+ sanity 0.00 Mb
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x5b004018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b01c018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b060018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a4018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c3018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x5b0ce018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0e3018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b127018 [los] [B
roots objects refs null
10620 328679 1022306 296197
=========================================================================
=======
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 0, SpaceRC = 0 0x4700000c [boot]
Lcom/ibm/JikesRVM/VM_BootRecord;
ERROR: SanityRC = 845, SpaceRC = 0 0x4b00000c [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 752, SpaceRC = 0 0x4b000074 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 865, SpaceRC = 0 0x4b0000e8 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 873, SpaceRC = 0 0x4b000140 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
............ (it continues like that for pages) ...................
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-09-06 11:16:30 UTC
Permalink
Thanks Elias,

Your initial report indicates a GC-related crash. That is what I want
to reproduce.

When I follow the exact instructions I see below, I do not get a crash,
but I do get warnings from the GC sanity checker. These look
interesting, and are consistent with other information about this bug.

However, what I really want is to be able to reproduce the error you
saw. I still can't do that. :-/

--Steve
Post by Elias Naur
Post by Steve Blackburn
Hmmm... this is not making much sense.
public class Test {
public static void main(String[] args) {
System.gc();
}
}
and executing it against a MarkSweep build will fail. I could not
repeat this. If this trivial example were to fail, I think many if not
most of our nightly regressions would be failing, so if I understand
what you're saying correctly, I think there must be something unusual
about your environment.
Are you able to run any regression tests? Can you run SPECjvm98, for
example (this class system.gc() at the start and end of each iteration)?
I noticed that you are not using a standard build config. Can you
verify that this bug arises without the modifications you made to build
config (eg your changes to CC etc etc)? You should be able to build on
your Ubuntu platform with the default config (perhaps JAVA_HOME may need
to change).
I'd really like to get this nailed, but I can't reproduce it...
--Steve
Are you running on an intel machine with -X:gc:sanityCheck=true and
-X:gc:fullHeapSystemSystemGC=true? Here's the output from a BaseBaseMarkSweep
configuration with only JAVAM_HOME changed in the i686 config file on the
Ubuntu machine (totally unrelated to my own SUSE machine that exhibits the
public class Test {
public static void main(String[] args) {
System.gc();
}
}
-X:gc:fullHeapSystemGC=true -X:processors=1 -X:gc:sanityCheck=true Test
boot 0x47000000->0x56ffffff
immortal 0x57000000->0x58ffffff
meta 0x59000000->0x5affffff
los 0x5b000000->0x5dffffff
plos 0xa9000000->0xafffffff
ms 0x5e000000->0x99bfffff
sanity 0x99c00000->0x9bbfffff
Collection sanity checking enabled.
Collection triggered due to external request
Collection 1: reserved = 16 MB (4314 pgs) total = 20 MB (5120 pgs)
Before Collection: used = 16.85 Mb = boot 0.00 Mb + immortal 0.12 Mb + meta
0.00 Mb + los 1.41 Mb + plos 3.30 Mb + ms 12.00 Mb + sanity 0.00 Mb
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x5b000018 [los] [Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b003018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b01b018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b05f018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a3018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c2018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x5b0cd018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0e2018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b126018 [los] [B
roots objects refs null
10543 326352 1014702 295093
================================================================================
Post by Steve Blackburn
Post by Elias Naur
Post by Elias Naur
This is unfortunate. I've looked at the addresses and they don't seem to
clash with anything (aren't library code marked read-only so that I'd
get an access violation of the mappings overlapped?). I'll look into
SUSE 10.1
Intel Pentium M
uname: Linux ip173 2.6.16.21-0.13-default #1 Mon Jul 17 17:22:44 UTC
2006 i686 i686 i386 GNU/Linux
Ubuntu 6.06
Pentium 4
Linux lucia 2.6.15-26-686 #1 SMP PREEMPT Thu Aug 3 03:13:28 UTC 2006 i686
GNU/Linux
boot 0x47000000->0x56ffffff
immortal 0x57000000->0x58ffffff
meta 0x59000000->0x5affffff
los 0x5b000000->0x5dffffff
plos 0xa9000000->0xafffffff
nursery 0x9a000000->0xa8ffffff
ms 0x5e000000->0x8fbfffff
sanity 0x8fc00000->0x91bfffff
Collection sanity checking enabled.
Collection triggered due to resource exhaustion
Collection 1: reserved = 22 MB (5634 pgs) total = 20 MB (5120
pgs) Before Collection: used = 12.57 Mb = boot 0.00 Mb + immortal 0.12 Mb
+ meta 0.07 Mb + los 0.80 Mb + plos 2.12 Mb + nursery 9.43 Mb + ms 0.00
Mb + sanity 0.00 Mb
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 320975 998503 290951
=========================================================================
=======
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
roots objects refs null
10342 321575 999223 291071
=========================================================================
======= After Collection: used = 7.32 Mb = boot 0.00 Mb + immortal 0.12 Mb
+ meta 0.00 Mb + los 0.80 Mb + plos 2.10 Mb + nursery 0.00 Mb + ms 4.28
Mb + sanity 0.00 Mb
reserved = 7 MB (1874 pgs) total = 20 MB (5120
pgs) Collection time: 2864.14 seconds
Collection finished (ms): 2876.00
Collection triggered due to external request
[Full heap]Collection 2: reserved = 12 MB (3155 pgs) total =
20 MB (5120 pgs)
Before Collection: used = 10.73 Mb = boot 0.00 Mb + immortal 0.12 Mb +
meta 0.01 Mb + los 1.41 Mb + plos 3.29 Mb + nursery 1.59 Mb + ms 4.28 Mb
+ sanity 0.00 Mb
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x5b004018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b01c018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b060018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0a4018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0c3018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x5b0ce018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x5b0e3018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x5b127018 [los] [B
roots objects refs null
10620 328679 1022306 296197
=========================================================================
=======
============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 0, SpaceRC = 0 0x4700000c [boot]
Lcom/ibm/JikesRVM/VM_BootRecord;
ERROR: SanityRC = 845, SpaceRC = 0 0x4b00000c [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 752, SpaceRC = 0 0x4b000074 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 865, SpaceRC = 0 0x4b0000e8 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 873, SpaceRC = 0 0x4b000140 [boot]
[Lcom/ibm/JikesRVM/VM_Code;
............ (it continues like that for pages) ...................
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-09-06 11:37:19 UTC
Permalink
Post by Steve Blackburn
Thanks Elias,
Your initial report indicates a GC-related crash. That is what I want
to reproduce.
When I follow the exact instructions I see below, I do not get a crash,
but I do get warnings from the GC sanity checker. These look
interesting, and are consistent with other information about this bug.
However, what I really want is to be able to reproduce the error you
saw. I still can't do that. :-/
--Steve
I'm sorry, I misunderstood you then, thinking that the sanity checker faults
were the focus. The GC crash I reported happens with a larger program. I'll
isolate and package up a test for you ASAP.

- elias

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-09-06 11:58:55 UTC
Permalink
Hi Elias,
Post by Elias Naur
I'm sorry, I misunderstood you then, thinking that the sanity checker faults
were the focus. The GC crash I reported happens with a larger program. I'll
isolate and package up a test for you ASAP.
Thanks for this. The sanity checker is useful, but it does have
faults. In fact with the current svn head it produces huge numbers of
non-errors relating to the boot image. That's why I was focused on
trying to reproduce an actual failure.

Of course the simpler the context in which that failure arises, the
easier it will be to debug it. If you can show it with a BaseBase build
and a full heap collector, so much the better, since these are the
simplest and most well understood environments.

Cheers,

--Steve

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-09-06 17:15:24 UTC
Permalink
Post by Steve Blackburn
Hi Elias,
Post by Elias Naur
I'm sorry, I misunderstood you then, thinking that the sanity checker
faults were the focus. The GC crash I reported happens with a larger
program. I'll isolate and package up a test for you ASAP.
Thanks for this. The sanity checker is useful, but it does have
faults. In fact with the current svn head it produces huge numbers of
non-errors relating to the boot image. That's why I was focused on
trying to reproduce an actual failure.
Of course the simpler the context in which that failure arises, the
easier it will be to debug it. If you can show it with a BaseBase build
and a full heap collector, so much the better, since these are the
simplest and most well understood environments.
I've updated the bug report with a test that should exhibit the GC related
crash:

http://sourceforge.net/tracker/index.php?func=detail&aid=1549822&group_id=128805&atid=712768

I'm afraid I couldn't prune the program any more without loosing the crash
behaviour, so it still parses large xml files. Also, I could only reproduce
the crash with the prototype configuration, not the BaseBaseMarkSweep. I hope
this is helpful to you.

- elias

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-09-07 12:15:16 UTC
Permalink
Thanks Elias. I was able to reproduce the bug with this. I have a fair
idea of what is going on and hopefully will have it fixed tomorrow.

--Steve
Post by Elias Naur
I've updated the bug report with a test that should exhibit the GC related
http://sourceforge.net/tracker/index.php?func=detail&aid=1549822&group_id=128805&atid=712768
I'm afraid I couldn't prune the program any more without loosing the crash
behaviour, so it still parses large xml files. Also, I could only reproduce
the crash with the prototype configuration, not the BaseBaseMarkSweep. I hope
this is helpful to you.
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Steve Blackburn
2006-09-12 11:46:43 UTC
Permalink
I have identified the problem and committed a temporary fix.

The problem manifest as a corrupt tib for some object, x, but in fact
(after some tedious investigation) was due to x being overwritten by y.
Since x had a larger header due to address based hashing, what was the
tib for x was now the first field of y, which happened to be a valid
object, but not a tib.

The reason x was overwritten by y was that some of the freeing code was
not accommodating the possibility of the larger header for object x and
was thus determining that there was no object in that cell and therefore
declaring it free---so it was reused.

The temporary fix is to declare that GenMS needs support for linear
scanning, but this is probably not the right way to fix the problem. I
hope to have this resolved very soon.

--Steve
Post by Steve Blackburn
Thanks Elias. I was able to reproduce the bug with this. I have a fair
idea of what is going on and hopefully will have it fixed tomorrow.
--Steve
Post by Elias Naur
I've updated the bug report with a test that should exhibit the GC related
http://sourceforge.net/tracker/index.php?func=detail&aid=1549822&group_id=128805&atid=712768
I'm afraid I couldn't prune the program any more without loosing the crash
behaviour, so it still parses large xml files. Also, I could only reproduce
the crash with the prototype configuration, not the BaseBaseMarkSweep. I hope
this is helpful to you.
- elias
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jikesrvm-researchers mailing list
https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-09-06 10:34:34 UTC
Permalink
Post by Steve Blackburn
Are you able to run any regression tests? Can you run SPECjvm98, for
example (this class system.gc() at the start and end of each iteration)?
I don't have access to the SPECjvm98 test, but I can confirm that the internal
"gctest" RVM tests (on both the SUSE and the Ubuntu machine) passes as well
as the DaCapo (beta-2006-08) antlr test, although with a strange exit NPE:

(***@lucia) ~> rvm -jar dacapo-beta-2006-08.jar antlr
.... [snipped] ...
ANTLR Parser Generator Version 2.7.2 1989-2003 jGuru.com
error: file ./scratch/antlr/java/unicode.g not found
Running antlr on grammar antlr/java/xml.g
ANTLR Parser Generator Version 2.7.2 1989-2003 jGuru.com
error: file ./scratch/antlr/java/xml.g not found
===== DaCapo antlr PASSED in 32398 msec =====
java.lang.NullPointerException
java.lang.NullPointerException
at dacapo.Benchmark.deleteTree((null); machine code offset:
0x00000146)
at dacapo.Benchmark.cleanup((null); machine code offset: 0x000000B0)
at dacapo.TestHarness.main((null); machine code offset: 0x00001465)
at Harness.main((null); machine code offset: 0x00000065)

However, they don't run with the sanitycheck nor the forcefullgc flags
enabled.

- elias

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Elias Naur
2006-09-06 11:30:28 UTC
Permalink
Post by Steve Blackburn
Hmmm... this is not making much sense.
public class Test {
public static void main(String[] args) {
System.gc();
}
}
and executing it against a MarkSweep build will fail. I could not
repeat this. If this trivial example were to fail, I think many if not
most of our nightly regressions would be failing, so if I understand
what you're saying correctly, I think there must be something unusual
about your environment.
Ok this is getting odd - I tried the system.gc() test with sanitycheck and
forcefullgc enabled on a clean RVM (using default powerpc-unknown-osx-gnu
config with no changes) on a G5 Mac OS X and I got nearly the same errors.
What am I doing wrong?

boot 0x31000000->0x40ffffff
immortal 0x41000000->0x42ffffff
meta 0x43000000->0x44ffffff
los 0x45000000->0x473fffff
plos 0x7ac00000->0x7fffffff
ms 0x47400000->0x737fffff
sanity 0x73800000->0x757fffff
Collection sanity checking enabled.
Collection triggered due to external request
Collection 1: reserved = 14 MB (3668 pgs) total = 20 MB (5120 pgs)
Before Collection: used = 14.32 Mb = boot 0.00 Mb + immortal 0.09 Mb + meta
0.00 Mb + los 0.63 Mb + plos 2.58 Mb + ms 11.01 Mb + sanity 0.00 Mb

============================== GC Sanity Checking
==============================
Performing Sanity Checks...
ERROR: SanityRC = 1, SpaceRC = 0 0x45000018 [los] [Lcom/ibm/JikesRVM/VM_Code;
ERROR: SanityRC = 1, SpaceRC = 0 0x45004018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x4500c018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x45014018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x4501f018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_Atom;
ERROR: SanityRC = 1, SpaceRC = 0 0x45032018 [los]
[Ljava/util/HashMap$HashEntry;
ERROR: SanityRC = 0, SpaceRC = 0 0x4503d018 [los]
[Lcom/ibm/JikesRVM/classloader/VM_MemberReference;
ERROR: SanityRC = 1, SpaceRC = 0 0x45065018 [los] [B
ERROR: SanityRC = 1, SpaceRC = 0 0x4506d018 [los] [B
ERROR: SanityRC = 0, SpaceRC = 0 0x7ae49018 [plos] [I
roots objects refs null
10152 314913 946901 267010
================================================================================

However, the RVM hangs on the Dacapo antlr test:

ip250:~/svn/rvmroot/testing/harness/tests/gctest oddlabs$ rvm -jar
~/svn/rvmroot/dacapo-beta-2006-08.jar antlr
0x3530bf4cJikesRVM: WARNING: Virtual processor has ignored timer interrupt for
5000 ms.
This may indicate that a blocking system call has occured and the VM is
deadlocked
JikesRVM: WARNING: Virtual processor has ignored timer interrupt for 6000 ms.
This may indicate that a blocking system call has occured and the VM is
deadlocked
JikesRVM: WARNING: Virtual processor has ignored timer interrupt for 7000 ms.
....[snip].....


The gctests pass until Exhaust is reached, which times out, even with a 600
second limit:

Running LargeAlloc with limit of 600 seconds...
/Users/oddlabs/svn/rvmroot/rvm/bin/rvm -Xmx60M -classpath ./cp
LargeAlloc base

You are sane LargeAlloc

make Exhaust HEAPSIZE=50
make START_NAME=Exhaust sanity-check-rule MY_RULE='/usr/bin/fgrep -q
"Overall:"'
make bench-rvm
/usr//bin/javac -encoding iso-8859-1 -d ./cp
-classpath /Users/oddlabs/svn/rvmroot/testing/tests/gctest:./cp:/Users/oddlabs/svn/rvmroot/build/RVM.classes/jksvm.jar:/Users/oddlabs/svn/rvmroot/build/RVM.classes/rvmrt.jar /Users/oddlabs/svn/rvmroot/testing/tests/gctest/Exhaust.java

Running Exhaust with limit of 600 seconds...
/Users/oddlabs/svn/rvmroot/rvm/bin/rvm -Xmx50M -classpath ./cp Exhaust
base
make[4]: *** [check] Error 1
make[3]: *** [do-sanity-check-rule] Error 2
make[2]: *** [sanity-check-rule] Error 2
make[1]: *** [Exhaust] Error 2
make: *** [sanity] Error 2

- elias

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Loading...