Discussion:
The problem with the D GC
Oskar Linde
2007-01-08 12:22:00 UTC
Permalink
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.

Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.

import std.random;

void main() {
// The real memory use, ~20 mb
uint[] data;
data.length = 5_000_000;
foreach(inout x; data)
x = rand();
while(1) {
// simulate reading a few kb of data
uint[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(inout x; incoming)
x = rand();
// do something with the data...
}
}

The result may not be as expected. The program will use up all available
memory (for me crashing at about 2.7 gb of memory usage) and at the same
time run extremely slow due to the panicked GC scanning all memory over
and over.

The reason is the 20 mb of random data and the small 32-bit memory
address range of 4 GB. To understand how bad this is, 20 mb of random
data will result in _each_ 4k memory page on average having 5 random
pointers to it. Those spurious pointers are laying a dense mine-field
effectively disabling the GC.

This means that each time you rely on the GC (array appending/resizing,
Phobos function calls etc), you have a potential memory leak. (That is
unless all the program data is nothing but valid pointers/references or
all non-pointer data is hidden from the GC.)

The above program is of course just a toy illustrating the phenomena. In
a text processing program of mine the bulk of the data is short char[]
strings. The program still has runaway memory leaks leading to an
inevitable crash. I have absolutely no idea how to handle text
processing using the D recommended char[] and CoW idiom without getting
severe memory leaks.

The definite solution has to be a GC that only scans memory containing
pointers. Sean's patches to make the GC skip scanning memory known to
contain elements smaller than sizeof(void*) will probably help
tremendously. (I'd just have to make sure I'm not using dchar[] strings,
float or double data, or the DMD associative array implementation)
--
/Oskar
Johan Granberg
2007-01-08 12:44:41 UTC
Permalink
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.
import std.random;
void main() {
// The real memory use, ~20 mb
uint[] data;
data.length = 5_000_000;
foreach(inout x; data)
x = rand();
while(1) {
// simulate reading a few kb of data
uint[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(inout x; incoming)
x = rand();
// do something with the data...
}
}
The result may not be as expected. The program will use up all available
memory (for me crashing at about 2.7 gb of memory usage) and at the same
time run extremely slow due to the panicked GC scanning all memory over
and over.
The reason is the 20 mb of random data and the small 32-bit memory
address range of 4 GB. To understand how bad this is, 20 mb of random
data will result in _each_ 4k memory page on average having 5 random
pointers to it. Those spurious pointers are laying a dense mine-field
effectively disabling the GC.
This means that each time you rely on the GC (array appending/resizing,
Phobos function calls etc), you have a potential memory leak. (That is
unless all the program data is nothing but valid pointers/references or
all non-pointer data is hidden from the GC.)
The above program is of course just a toy illustrating the phenomena. In
a text processing program of mine the bulk of the data is short char[]
strings. The program still has runaway memory leaks leading to an
inevitable crash. I have absolutely no idea how to handle text
processing using the D recommended char[] and CoW idiom without getting
severe memory leaks.
The definite solution has to be a GC that only scans memory containing
pointers. Sean's patches to make the GC skip scanning memory known to
contain elements smaller than sizeof(void*) will probably help
tremendously. (I'd just have to make sure I'm not using dchar[] strings,
float or double data, or the DMD associative array implementation)
I have observed the same behavior but did not realize why it happened
(thought it was a gc bug on osx or something). Something that helped the
problem for me was to call fullCollect very often (40 times a second) this
reduced the leak from 1mb a second to almost nothing.
Lutger
2007-01-08 13:00:47 UTC
Permalink
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
....

Wow that is so bad. I thought it has been mentioned this will not be a
problem in practice but it apparently is.
Lionello Lunesu
2007-01-08 13:15:19 UTC
Permalink
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.
import std.random;
void main() {
// The real memory use, ~20 mb
uint[] data;
data.length = 5_000_000;
foreach(inout x; data)
x = rand();
while(1) {
// simulate reading a few kb of data
uint[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(inout x; incoming)
x = rand();
// do something with the data...
}
}
The result may not be as expected. The program will use up all available
memory (for me crashing at about 2.7 gb of memory usage) and at the same
time run extremely slow due to the panicked GC scanning all memory over
and over.
The reason is the 20 mb of random data and the small 32-bit memory
address range of 4 GB. To understand how bad this is, 20 mb of random
data will result in _each_ 4k memory page on average having 5 random
pointers to it. Those spurious pointers are laying a dense mine-field
effectively disabling the GC.
This means that each time you rely on the GC (array appending/resizing,
Phobos function calls etc), you have a potential memory leak. (That is
unless all the program data is nothing but valid pointers/references or
all non-pointer data is hidden from the GC.)
The above program is of course just a toy illustrating the phenomena. In
a text processing program of mine the bulk of the data is short char[]
strings. The program still has runaway memory leaks leading to an
inevitable crash. I have absolutely no idea how to handle text
processing using the D recommended char[] and CoW idiom without getting
severe memory leaks.
The definite solution has to be a GC that only scans memory containing
pointers. Sean's patches to make the GC skip scanning memory known to
contain elements smaller than sizeof(void*) will probably help
tremendously. (I'd just have to make sure I'm not using dchar[] strings,
float or double data, or the DMD associative array implementation)
I've run into similar problems back when I was messing around with the
Universal Machine for that programing contest. It would run slower and
slower. Skipping GC checks on arrays without pointers is a must, if you
ask me.

L.
Tom S
2007-01-08 15:34:04 UTC
Permalink
(...) gc hell (...)
I've experienced pretty much the same while doing memory-intensive
computations. Since then I've been using lots of malloc/realloc/free and
both my memory footprint and execution speed have improved. The GC needs
a fix. Badly.


--
Tomasz Stachowiak
kenny
2007-01-08 16:24:21 UTC
Permalink
I also have experienced bad GC performance. I found it to be because of
the ~ operator on strings. The program is a daemon, and after it had
been running for a while, memory usage gets truly horrific, and
performance degrades very bad.

This was back on 0.140, so things may have changed since then...

I solved two ways. First, I wrote a function which accepts variadic
arguments, and separated everything by a comma instead of appending the
strings and the performance difference was stunning. it was also nice to
be able to write:

prt("string", my_int, " ", my_float, "string2");

instead of

"string"~std.string.toString(my_int)~"
"~std.string.toString(my_float)~"string2"

Second, like someone else in this thread I also called fullCollect every
second.

I used to use gdc-0.08 with boehm-gc too. I can't honestly remember if
that had the same problem.
Post by Tom S
(...) gc hell (...)
I've experienced pretty much the same while doing memory-intensive
computations. Since then I've been using lots of malloc/realloc/free and
both my memory footprint and execution speed have improved. The GC needs
a fix. Badly.
--
Tomasz Stachowiak
Sean Kelly
2007-01-08 17:09:46 UTC
Permalink
Post by kenny
I also have experienced bad GC performance. I found it to be because of
the ~ operator on strings. The program is a daemon, and after it had
been running for a while, memory usage gets truly horrific, and
performance degrades very bad.
This was back on 0.140, so things may have changed since then...
Probably not. Assuming this is a multithreaded app, you have to pay for
two mutex locks for every concat operation. So an expression like:

a = b ~ c ~ d;

would result in six locks of the mutex protecting the GC to perform
allocations (I think anyway--I don't believe chained concats have
special handling). I think this would also result in two discarded
temporary buffers for the GC to clean up later. And since the Phobos GC
scans all such blocks by default...

My patch will address the scanning issue, but it won't address the mutex
issue--there's really no easy way to avoid that one.


Sean
Frits van Bommel
2007-01-08 17:41:08 UTC
Permalink
Post by Sean Kelly
Assuming this is a multithreaded app, you have to pay for
a = b ~ c ~ d;
would result in six locks of the mutex protecting the GC to perform
allocations (I think anyway--I don't believe chained concats have
special handling). I think this would also result in two discarded
temporary buffers for the GC to clean up later. And since the Phobos GC
scans all such blocks by default...
Chained concats *do* have special handling. See
phobos/internal/arraycat.d, _d_arraycatn() (first function in the file,
can't miss it).
This function takes element size and number of arrays, followed by
C-style vararg arrays.
And yes, it only allocates once :).

By the way, how do you figure six locks? At two locks per concat, that
code only has two concats so even without above special handling
wouldn't it still only lock four times?

Also: Why two locks per concat at all? A concat only requires one
allocation, so does a single alloc require two locks?
(I haven't looked much into the GC code, so this is just curiosity)
Post by Sean Kelly
My patch will address the scanning issue, but it won't address the mutex
issue--there's really no easy way to avoid that one.
What exactly do you mean by "the mutex issue"? Is using mutexes at all
the problem, or is it locking them too often per operation (i.e. more
than once per alloc)?

If you don't want to use them at all, I think the closest you'd get
would involve implementing thread-local heaps.
But that would still require a mutex if you want to allow deleting an
object from a different thread than it was allocated in...
Frits van Bommel
2007-01-08 17:51:22 UTC
Permalink
Post by Frits van Bommel
If you don't want to use them at all, I think the closest you'd get
would involve implementing thread-local heaps.
But that would still require a mutex if you want to allow deleting an
object from a different thread than it was allocated in...
Now that I think about it, maybe it would be possible to eliminate that
mutex as well. Hmm...
Sean Kelly
2007-01-08 19:18:08 UTC
Permalink
Post by Frits van Bommel
Assuming this is a multithreaded app, you have to pay for two mutex
a = b ~ c ~ d;
would result in six locks of the mutex protecting the GC to perform
allocations (I think anyway--I don't believe chained concats have
special handling). I think this would also result in two discarded
temporary buffers for the GC to clean up later. And since the Phobos
GC scans all such blocks by default...
Chained concats *do* have special handling. See
phobos/internal/arraycat.d, _d_arraycatn() (first function in the file,
can't miss it).
This function takes element size and number of arrays, followed by
C-style vararg arrays.
And yes, it only allocates once :).
Oh good :) I knew about the varargs but hadn't given the code a close
enough look to be sure.
Post by Frits van Bommel
By the way, how do you figure six locks? At two locks per concat, that
code only has two concats so even without above special handling
wouldn't it still only lock four times?
I miscounted :p It would have been four without the special handling.
Post by Frits van Bommel
Also: Why two locks per concat at all? A concat only requires one
allocation, so does a single alloc require two locks?
(I haven't looked much into the GC code, so this is just curiosity)
The code in internal/gc/gc.d first checks the size of the array to see
if a realloc is necessary, then it performs the realloc. So worst case
you're stuck with two locks per operation. However, this may not be the
case in the arraycat routines. It's been a while since I've looked at them.
Post by Frits van Bommel
My patch will address the scanning issue, but it won't address the
mutex issue--there's really no easy way to avoid that one.
What exactly do you mean by "the mutex issue"? Is using mutexes at all
the problem, or is it locking them too often per operation (i.e. more
than once per alloc)?
Too often per operation. So given the vararg stuff you mentioned above,
this isn't an issue IMO. As it is, locks should only be acquired if the
app is multithreaded, so this cost is only incurred if it's necessary.
I think Phobos might actually lock in a few instances where it isn't
strictly necessary, but I can't recall for certain.
Post by Frits van Bommel
If you don't want to use them at all, I think the closest you'd get
would involve implementing thread-local heaps.
But that would still require a mutex if you want to allow deleting an
object from a different thread than it was allocated in...
No it wouldn't--give each per-thread heap a lock-free free list. But
per-thread heaps for a GC are somewhat complicated. I think you'd have
to implement something like a read/write lock where the "writer" is
actually the thread that wants to collect.


Sean
Frits van Bommel
2007-01-08 22:15:04 UTC
Permalink
Post by Sean Kelly
Post by Frits van Bommel
Also: Why two locks per concat at all? A concat only requires one
allocation, so does a single alloc require two locks?
(I haven't looked much into the GC code, so this is just curiosity)
The code in internal/gc/gc.d first checks the size of the array to see
if a realloc is necessary, then it performs the realloc. So worst case
you're stuck with two locks per operation. However, this may not be the
case in the arraycat routines. It's been a while since I've looked at them.
Normal concats always allocate, so there's no need to check the size
(and so they don't).
You're probably thinking of appends.
Post by Sean Kelly
Post by Frits van Bommel
If you don't want to use them at all, I think the closest you'd get
would involve implementing thread-local heaps.
But that would still require a mutex if you want to allow deleting an
object from a different thread than it was allocated in...
No it wouldn't--give each per-thread heap a lock-free free list. But
per-thread heaps for a GC are somewhat complicated. I think you'd have
to implement something like a read/write lock where the "writer" is
actually the thread that wants to collect.
I replied to myself about 10 minutes later when I realized it was
probably possible :).
kenny
2007-01-08 21:01:36 UTC
Permalink
Honestly, it was a long time ago, so remembering is difficult. I didn't
have SVN installed back then :) I'm pretty sure it wasn't
multi-threaded. I think that we launched multiple processes. (eg. not
mutexes either) I know I used a lot of assoc arrays, and I also used a
lot of concatenation. The size of the code was quite huge too.. at least
8000 lines.

It could be a combination of those factors, and it may have been as
early of a version of 0.73. I dunno dude, I just remember it sucked.

I have not really been having bad experiences with the GC lately, accept
that, it's annoying at times with the peaks. Every request will give a
response time of 2ms, accept when the GC runs, and then it'll give a
50ms response. I suppose that's not that horrible, but it is a 25x
difference.

What I've wanted to do for a while now is to be able to set the size of
the pool of memory, and also want the variable to be set telling me some
form of information of how close the next collect is.. whether it's
largest memory block avail, or % fragmentation, or something, so I can
take the processes and tell them to stop accepting requests (and let one
of the other processes handle it) and run the GC so the GC doesn't run
while accepting requests.

I guess I can just estimate based off of the amount of requests it has
processed... but the other way sounds way cooler.
Post by Sean Kelly
Post by kenny
I also have experienced bad GC performance. I found it to be because
of the ~ operator on strings. The program is a daemon, and after it
had been running for a while, memory usage gets truly horrific, and
performance degrades very bad.
This was back on 0.140, so things may have changed since then...
Probably not. Assuming this is a multithreaded app, you have to pay for
a = b ~ c ~ d;
would result in six locks of the mutex protecting the GC to perform
allocations (I think anyway--I don't believe chained concats have
special handling). I think this would also result in two discarded
temporary buffers for the GC to clean up later. And since the Phobos GC
scans all such blocks by default...
My patch will address the scanning issue, but it won't address the mutex
issue--there's really no easy way to avoid that one.
Sean
Tom S
2007-01-08 17:14:06 UTC
Permalink
Post by kenny
Second, like someone else in this thread I also called fullCollect every
second.
This is not always an option. With a few hundred megs of memory
allocated, collections can last seconds. Resorting to malloc was the
only rescue for me...


Anyway, thanks for sharing your experiences guys. Now I know I'm not the
only one around having difficult relationships with the GC.


--
Tomasz Stachowiak
Sean Kelly
2007-01-08 17:04:38 UTC
Permalink
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.
import std.random;
void main() {
// The real memory use, ~20 mb
uint[] data;
data.length = 5_000_000;
foreach(inout x; data)
x = rand();
while(1) {
// simulate reading a few kb of data
uint[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(inout x; incoming)
x = rand();
// do something with the data...
}
}
The result may not be as expected. The program will use up all available
memory (for me crashing at about 2.7 gb of memory usage) and at the same
time run extremely slow due to the panicked GC scanning all memory over
and over.
The reason is the 20 mb of random data and the small 32-bit memory
address range of 4 GB. To understand how bad this is, 20 mb of random
data will result in _each_ 4k memory page on average having 5 random
pointers to it. Those spurious pointers are laying a dense mine-field
effectively disabling the GC.
This means that each time you rely on the GC (array appending/resizing,
Phobos function calls etc), you have a potential memory leak. (That is
unless all the program data is nothing but valid pointers/references or
all non-pointer data is hidden from the GC.)
The above program is of course just a toy illustrating the phenomena. In
a text processing program of mine the bulk of the data is short char[]
strings. The program still has runaway memory leaks leading to an
inevitable crash. I have absolutely no idea how to handle text
processing using the D recommended char[] and CoW idiom without getting
severe memory leaks.
The definite solution has to be a GC that only scans memory containing
pointers. Sean's patches to make the GC skip scanning memory known to
contain elements smaller than sizeof(void*) will probably help
tremendously. (I'd just have to make sure I'm not using dchar[] strings,
float or double data, or the DMD associative array implementation)
Since the patch keys on element size, the above code would still leak
horribly by default. However, the user can set/clear this "no scan"
flag explicitly, so if there are any memory blocks that are still
scanned by default, you can indicate that the GC should not scan them.
I think between the two, we should be in pretty good shape.


Sean
Andrey Khropov
2007-01-08 18:25:36 UTC
Permalink
After having fought a while with D programs with runaway memory leaks, I've
unfortunately had to come to the conclusion that the D GC is not ready for
production use. The problem is what I'd call "spurious pointers". That is
random data (strings, numbers, image data, audio or whatever) appearing to
the GC to be full of pointers to all over the memory space.
The definite solution has to be a GC that only scans memory containing
pointers. Sean's patches to make the GC skip scanning memory known to contain
elements smaller than sizeof(void*) will probably help tremendously. (I'd
just have to make sure I'm not using dchar[] strings, float or double data,
or the DMD associative array implementation)
That's what precise GC is all about.

I think it's the single biggest problem of the present D implementations. Some
of my measurements have shown that on memory-intensive applications current
Phobos GC could be 10x slower than MS.NET 2.0's GC:
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digi
talmars.D&artnum=43991
--
AKhropov
%u
2007-01-08 19:23:39 UTC
Permalink
== Quote from Oskar Linde (oskar.lindeREM at OVEgmail.com)'s article
Post by Oskar Linde
This means that each time you rely on the GC (array
appending/resizing, Phobos function calls etc), you have a
potential memory leak.
Wrong. You have an intentional memory leak, from which the GC
cannot recover you.

The GC was designed to eventually free coders from memory leak
accidents. To enable this all variables are initialized by default--
-because this way no random data, including pointers, can survive.

In total your toy program establishes exactly that environment for
which the GC is known to fail.

There is an easy solution for such environments by increasing the
amount of memory needed for pointers by the factor two---or halving
the available memory if memory bounds are tight and one takes a
look from the other side.

But even this solution will not prevent the GC from sucking in all
data swapped out, if the GC is not coupled with the virtual memory
manager.
Miles
2007-01-08 20:40:09 UTC
Permalink
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Some time ago I had the exact same problem, and I tried to convince
Walter of how bad this problem is. Others have said the same thing
before you and me. To Walter, "in practice, all integers vary from 0 to
100" or something like that, implying that the problem does not exist in
fact or is not relevant.

I should also mention that this is considered a serious security issue,
on the same class of the QuickSort algorithm.

For those not aware of secure programming practices: using QuickSort to
sort data input by a remote user is considered a bad programming
practice, because although QuickSort is O(n*log(n)) on average, there is
a set of input data that makes the algorithm O(n^2). Knowing the
implementation, an attacker may feed the application with carefully
crafted data that exploits this weakness.

In the D case, knowing how data are laid on memory, it is easy for a
remote attacker to feed data that, read as pointers, would point to
large (and otherwise temporary) blocks of data, making them leak. Stack
base randomization and other similar techniques helps preventing this,
but IMO, just hides the problem.

Another problem in current GC implementation, IMO, is that memory space
once allocated is never given back to the OS when disposed. What makes
this worse is that it is common for programs to use a big amount of
memory during initialization (like parsing XML data files), and never
use it again. Again, Walter already knows of this and doesn't think it
is a problem.
Frits van Bommel
2007-01-08 22:23:08 UTC
Permalink
Post by Miles
I should also mention that this is considered a serious security issue,
on the same class of the QuickSort algorithm.
For those not aware of secure programming practices: using QuickSort to
sort data input by a remote user is considered a bad programming
practice, because although QuickSort is O(n*log(n)) on average, there is
a set of input data that makes the algorithm O(n^2). Knowing the
implementation, an attacker may feed the application with carefully
crafted data that exploits this weakness.
Of course, if you're only concerned about malicious users sending
worst-case data (and not about worst-case data appearing randomly)
there's an easy fix: randomize the data in O(n) :).
Don't laugh, they taught me that in school. And it works, Quicksort will
remain O(n*log(n)) on average with that simple step performed
beforehand, no matter the input data.
You'll need to trust your randomizer though, if the attacker can
influence *that* you're just screwed and should use introsort or merge
sort or something else with guaranteed O(n*log(n)) behavior if that's
really what you want...
Miles
2007-01-09 15:45:04 UTC
Permalink
Post by Frits van Bommel
Of course, if you're only concerned about malicious users sending
worst-case data (and not about worst-case data appearing randomly)
there's an easy fix: randomize the data in O(n) :).
Yeah, sure. You can also just pick a random element per iteration
(instead of the first one or the middle one).
Kevin Bealer
2007-01-09 23:03:54 UTC
Permalink
I would think you are find if you pick your pivot at random(), and seed the random
number generator at the top of your program with microseconds-since-1970 or
whatever the convenient chaotic local number is.

Or you could probably just use D's [].sort method -- I think Walter said it uses
quick-sort but checks for poor performance and switches to a heap sort automatically.

Kevin
Frits van Bommel
2007-01-09 23:41:28 UTC
Permalink
Post by Kevin Bealer
I would think you are find if you pick your pivot at random(), and seed the random
number generator at the top of your program with microseconds-since-1970 or
whatever the convenient chaotic local number is.
IIRC randomizing the input is a rather general solution to this kind of
problem (not just with Quicksort).
Post by Kevin Bealer
Or you could probably just use D's [].sort method -- I think Walter said it uses
quick-sort but checks for poor performance and switches to a heap sort automatically.
That would be a variant of introsort. I believe I mentioned it.
Bill Baxter
2007-01-08 22:33:44 UTC
Permalink
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Wow. That may be the problem I'm having too. I have some radial basis
function interpolation code that I put together just before the holiday
break, which I observed to be leaking like mad. First thing I need to
do today after catching up on mail is to figure out why, but maybe
you've just answered the question for me.

--bb
Bill Baxter
2007-01-09 04:33:21 UTC
Permalink
Here's a slightly less contrived version of Oskar's gc test.

import std.math;
import std.random;
import std.stdio;

void main() {
// The real memory use, ~40 mb
double[] data;
data.length = 5_000_000;
foreach(i, inout x; data) {
x = sin(cast(double)i/data.length);
//x = 1;
}
int count = 0;
int gcount = 0;
while(1) {
// simulate reading a few kb of data
double[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(i, inout x; incoming) {
x = sin(cast(double)i/incoming.length);
//x = 5;
}
// do something with the data...

// print status message every so often
count += incoming.length;
if (count > 1_000_000) {
count = 0;
gcount++;
writefln("%s processed", gcount);
}
}
}



This one uses doubles instead of uints and the data is the sin of some
number. These are _very_ realistic values for numeric data to have.
The same effect can be seen. Instead of hovering around 40MB, the
memory use grows and grows and performance slows and slows.

This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.

--bb
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.
%u
2007-01-09 06:38:44 UTC
Permalink
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Here's a slightly less contrived version of Oskar's gc test.
import std.math;
import std.random;
import std.stdio;
void main() {
// The real memory use, ~40 mb
double[] data;
data.length = 5_000_000;
foreach(i, inout x; data) {
x = sin(cast(double)i/data.length);
//x = 1;
}
int count = 0;
int gcount = 0;
while(1) {
// simulate reading a few kb of data
double[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(i, inout x; incoming) {
x = sin(cast(double)i/incoming.length);
//x = 5;
}
// do something with the data...
// print status message every so often
count += incoming.length;
if (count > 1_000_000) {
count = 0;
gcount++;
writefln("%s processed", gcount);
}
}
}
This one uses doubles instead of uints and the data is the sin of some
number. These are _very_ realistic values for numeric data to have.
The same effect can be seen. Instead of hovering around 40MB, the
memory use grows and grows and performance slows and slows.
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
--bb
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.
Agreed. This needs to be changed. Is the GC in that tango
library any better?
Sean Kelly
2007-01-09 07:20:55 UTC
Permalink
Post by %u
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Here's a slightly less contrived version of Oskar's gc test.
import std.math;
import std.random;
import std.stdio;
void main() {
// The real memory use, ~40 mb
double[] data;
data.length = 5_000_000;
foreach(i, inout x; data) {
x = sin(cast(double)i/data.length);
//x = 1;
}
int count = 0;
int gcount = 0;
while(1) {
// simulate reading a few kb of data
double[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(i, inout x; incoming) {
x = sin(cast(double)i/incoming.length);
//x = 5;
}
// do something with the data...
// print status message every so often
count += incoming.length;
if (count > 1_000_000) {
count = 0;
gcount++;
writefln("%s processed", gcount);
}
}
}
This one uses doubles instead of uints and the data is the sin of some
number. These are _very_ realistic values for numeric data to have.
The same effect can be seen. Instead of hovering around 40MB, the
memory use grows and grows and performance slows and slows.
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
--bb
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.
Agreed. This needs to be changed. Is the GC in that tango
library any better?
It's a modified version of the DMD GC. The "don't scan blocks
containing elements smaller than pointer size" feature is built-in, and
there is user-level control of that behavior on a per-block basis, among
other things. But it's still the same old mark/sweep GC at heart.


Sean
Luís Marques
2007-01-09 11:58:17 UTC
Permalink
Post by Sean Kelly
It's a modified version of the DMD GC. The "don't scan blocks
containing elements smaller than pointer size" feature is built-in, and
there is user-level control of that behavior on a per-block basis, among
other things. But it's still the same old mark/sweep GC at heart.
Does the new GC allow setting a hook to be informed of when a given
object was collected? (I need that)

Lu?s
Lutger
2007-01-09 12:29:17 UTC
Permalink
Post by Luís Marques
Post by Sean Kelly
It's a modified version of the DMD GC. The "don't scan blocks
containing elements smaller than pointer size" feature is built-in,
and there is user-level control of that behavior on a per-block basis,
among other things. But it's still the same old mark/sweep GC at heart.
Does the new GC allow setting a hook to be informed of when a given
object was collected? (I need that)
Lu?s
This is possible with these functions in Object:
final void notifyRegister(void delegate(Object) dg);
final void notifyUnRegister(void delegate(Object) dg);
Sean Kelly
2007-01-09 13:44:39 UTC
Permalink
Post by Lutger
Post by Luís Marques
Post by Sean Kelly
It's a modified version of the DMD GC. The "don't scan blocks
containing elements smaller than pointer size" feature is built-in,
and there is user-level control of that behavior on a per-block
basis, among other things. But it's still the same old mark/sweep GC
at heart.
Does the new GC allow setting a hook to be informed of when a given
object was collected? (I need that)
final void notifyRegister(void delegate(Object) dg);
final void notifyUnRegister(void delegate(Object) dg);
This option is available. There is also a global hook that can be
called when an object is collected by the GC (as opposed to destroyed
deterministically via delete):

bool myCollectHandler( Object o )
{
if( o.classinfo.name == "MyObject" )
{
(cast(MyObject) o).dispose();
return false;
}
return true;
}

setCollectHandler( &myCollectHandler );

Returning false from the hook tells the GC not to finalize the
object--ie. destroy its monitor but don't call its dtor. The purpose of
this method is twofold: to detect when memory is "leaked" and to allow
objects to be cleaned up differently if disposed via delete than via a
GC collection.


Sean
Sean Kelly
2007-01-09 16:06:21 UTC
Permalink
Post by Bill Baxter
Here's a slightly less contrived version of Oskar's gc test.
import std.math;
import std.random;
import std.stdio;
void main() {
// The real memory use, ~40 mb
double[] data;
data.length = 5_000_000;
foreach(i, inout x; data) {
x = sin(cast(double)i/data.length);
//x = 1;
}
int count = 0;
int gcount = 0;
while(1) {
// simulate reading a few kb of data
double[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(i, inout x; incoming) {
x = sin(cast(double)i/incoming.length);
//x = 5;
}
// do something with the data...
// print status message every so often
count += incoming.length;
if (count > 1_000_000) {
count = 0;
gcount++;
writefln("%s processed", gcount);
}
}
}
This one uses doubles instead of uints and the data is the sin of some
number. These are _very_ realistic values for numeric data to have. The
same effect can be seen. Instead of hovering around 40MB, the memory
use grows and grows and performance slows and slows.
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
the arrays using the following calls:

gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.

I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.


Sean
Ralf Schneider
2007-01-09 18:54:35 UTC
Permalink
Post by Sean Kelly
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior roughly
matched Phobos, with an 89 second run time and over 340MB of memory
consumed and growing steadily. Then I told the GC to not scan the arrays
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds with
43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether to
scan the block using TypeInfo, but at least it grants the programmer a way
to customize GC behavior somewhat to tune application performance. The
only stipulation with the current implementation is that block attributes
will not be preserved if an array is resized. This isn't terribly
difficult to fix, but I haven't done so yet.
It dosen't seem so hard for me to let the compiler set such an attribute on
arrays without pointers...

- Ralf
Frits van Bommel
2007-01-09 19:38:27 UTC
Permalink
Post by Ralf Schneider
It dosen't seem so hard for me to let the compiler set such an attribute on
arrays without pointers...
It's pretty hard if you can't modify the compiler ;).
Sean can't modify what DMD does (at least, not directly). He _can_
(directly) modify what the runtime library does by replacing it, which
is what he's done.
He (or anyone else, for that matter) might be able to implement this in
GDC though...
And Walter might be convinced to implement it in DMD (or, if it's
front-end only code, accept a patch that implements it).

Of course, that still leaves arrays of structs, which may contain both
pointers and non-pointers.
What we really need is a way for the GC to know what the type of the
memory is, or at least where the pointers are. This may be possible by
adding this info to TypeInfo & subclasses[1]. But then every memory
block would need a pointer to the relevant TypeInfo (or some condensed
form of this information, like flags for "only pointers" and "no
pointers", with TypeInfo pointer only if both are false).
This would definitely require some sort of compiler support though; it
would need to generate appropriate type information for structs, objects
(the actual memory, not the references) and unions.
It would then need to supply this information to the GC in the runtime,
requiring extra code generation.


[1]: This could be as simple as supplying a member function that
performs a callback for every offset containing a pointer.
Kyle Furlong
2007-01-09 22:13:18 UTC
Permalink
Post by Frits van Bommel
Post by Ralf Schneider
It dosen't seem so hard for me to let the compiler set such an
attribute on arrays without pointers...
It's pretty hard if you can't modify the compiler ;).
Sean can't modify what DMD does (at least, not directly). He _can_
(directly) modify what the runtime library does by replacing it, which
is what he's done.
He (or anyone else, for that matter) might be able to implement this in
GDC though...
And Walter might be convinced to implement it in DMD (or, if it's
front-end only code, accept a patch that implements it).
Of course, that still leaves arrays of structs, which may contain both
pointers and non-pointers.
What we really need is a way for the GC to know what the type of the
memory is, or at least where the pointers are. This may be possible by
adding this info to TypeInfo & subclasses[1]. But then every memory
block would need a pointer to the relevant TypeInfo (or some condensed
form of this information, like flags for "only pointers" and "no
pointers", with TypeInfo pointer only if both are false).
This would definitely require some sort of compiler support though; it
would need to generate appropriate type information for structs, objects
(the actual memory, not the references) and unions.
It would then need to supply this information to the GC in the runtime,
requiring extra code generation.
[1]: This could be as simple as supplying a member function that
performs a callback for every offset containing a pointer.
According to a conversation I had with Walter a month or two ago, this
is the direction we are going. With this sort of type information,
other, more modern GC implementations will become possible, one of which
I hope to write. :D
Sean Kelly
2007-01-09 22:41:34 UTC
Permalink
Post by Frits van Bommel
Post by Ralf Schneider
It dosen't seem so hard for me to let the compiler set such an
attribute on arrays without pointers...
It's pretty hard if you can't modify the compiler ;).
Sean can't modify what DMD does (at least, not directly). He _can_
(directly) modify what the runtime library does by replacing it, which
is what he's done.
He (or anyone else, for that matter) might be able to implement this in
GDC though...
And Walter might be convinced to implement it in DMD (or, if it's
front-end only code, accept a patch that implements it).
Of course, that still leaves arrays of structs, which may contain both
pointers and non-pointers.
What we really need is a way for the GC to know what the type of the
memory is, or at least where the pointers are. This may be possible by
adding this info to TypeInfo & subclasses[1]. But then every memory
block would need a pointer to the relevant TypeInfo (or some condensed
form of this information, like flags for "only pointers" and "no
pointers", with TypeInfo pointer only if both are false).
This would definitely require some sort of compiler support though; it
would need to generate appropriate type information for structs, objects
(the actual memory, not the references) and unions.
It would then need to supply this information to the GC in the runtime,
requiring extra code generation.
An interim alternative that came up in conversation would be to provide
similar functionality via template functions. With the .tupleof
property, it's possible to obtain a very accurate picture of what data
should be scanned. This would mean using a custom function instead of
'new', but it would certainly work:

MyClass c = create!(MyClass)( a, b, c );

The create function above could be written using TypeTuples and other
magic to allow direct parameter passing to the class ctor, making the
routine just as efficient as an in-language solution. Similar functions
could be written for arrays, exploiting some tricks in the language:

T[] resize(T)( T[] val, size_t len ) { ... }
int[] buf;
buf.resize( 1024 );

Again, perhaps not as clean as an in-language solution, but it could be
done today.

I'll look into doing something like this for Tango prior to release. It
shouldn't be too difficult, assuming the .tupleof property works as
described (some experimentation I made this morning suggests that it may
not).


Sean
Sean Kelly
2007-01-09 22:51:18 UTC
Permalink
Post by Sean Kelly
Similar functions
T[] resize(T)( T[] val, size_t len ) { ... }
Err, this should be:

T[] resize(T)( inout T[] val, size_t len ) { ... }

Sean
Frits van Bommel
2007-01-09 22:51:31 UTC
Permalink
Post by Sean Kelly
An interim alternative that came up in conversation would be to provide
similar functionality via template functions. With the .tupleof
property, it's possible to obtain a very accurate picture of what data
should be scanned. This would mean using a custom function instead of
I actually tried something like that a while back. Unfortunately DMD
kept segfaulting on me...
Just tried to compile that code again, still segfaults. I should
probably try cutting it down to something bugzilla-worthy.
zz
2007-01-09 23:00:09 UTC
Permalink
Post by Sean Kelly
I'll look into doing something like this for Tango prior to release. It
shouldn't be too difficult, assuming the .tupleof property works as
described (some experimentation I made this morning suggests that it may
not).
Sean
Some years ago we had some comparision between two PostScript (Not GS)
Interpreters without rendering to disk but generating loads of data
Harlequin RIP beat the other by large factors which we found out were
due to their garbage collector besides other things, this group also
built the GC for LispWorks and is now available open source, I don't
know how it's licenced but I know it was also used in Dylan from
Halequin (for those of you who remember the language).

Maybe you should look at some of their ideas, the link is bellow.

http://www.ravenbrook.com/project/mps/

Zz
Bill Baxter
2007-01-10 01:48:10 UTC
Permalink
Post by Sean Kelly
Post by Bill Baxter
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.
I'd certainly be willing to do that sort of thing in my code to make the
problem go away in the short term. In my case all of those setAttr
calls would be hidden away in library code, so it wouldn't make life any
harder for everyday use.

But it's not much use as a workaround until Tango is actually
downloadable somewhere. :-)

Once Tango is available, though, that would certainly be better than the
alternative of rewriting my library to use malloc/free. If that's going
to be necessary then I think I'll just as soon go back to C++ where at
least I can use smart pointers. The integrated GC is the #1 reason I'm
using D in the first place, and if I can't realistically use that then I
might as well be using C++.

--bb
%u
2007-01-10 03:41:39 UTC
Permalink
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Post by Sean Kelly
Post by Bill Baxter
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.
I'd certainly be willing to do that sort of thing in my code to make the
problem go away in the short term. In my case all of those setAttr
calls would be hidden away in library code, so it wouldn't make life any
harder for everyday use.
But it's not much use as a workaround until Tango is actually
downloadable somewhere. :-)
Once Tango is available, though, that would certainly be better than the
alternative of rewriting my library to use malloc/free. If that's going
to be necessary then I think I'll just as soon go back to C++ where at
least I can use smart pointers. The integrated GC is the #1 reason I'm
using D in the first place, and if I can't realistically use that then I
might as well be using C++.
--bb
I feel the same way. Without GC, D is just C++ with a few more features.
Kyle Furlong
2007-01-10 09:48:21 UTC
Permalink
Post by %u
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Post by Sean Kelly
Post by Bill Baxter
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.
I'd certainly be willing to do that sort of thing in my code to make the
problem go away in the short term. In my case all of those setAttr
calls would be hidden away in library code, so it wouldn't make life any
harder for everyday use.
But it's not much use as a workaround until Tango is actually
downloadable somewhere. :-)
Once Tango is available, though, that would certainly be better than the
alternative of rewriting my library to use malloc/free. If that's going
to be necessary then I think I'll just as soon go back to C++ where at
least I can use smart pointers. The integrated GC is the #1 reason I'm
using D in the first place, and if I can't realistically use that then I
might as well be using C++.
--bb
I feel the same way. Without GC, D is just C++ with a few more features.
I do not think that this is even remotely valid. Think of everything you
arent doing in D. You arent hacking macros with the preprocessor, you
arent declaring methods outside of classes, globally overloading
operators... To say that D is C++ + GC + some other features is just
plain ignorant.
%u
2007-01-10 15:53:01 UTC
Permalink
== Quote from Kyle Furlong (kylefurlong at gmail.com)'s article
Post by Kyle Furlong
Post by %u
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Post by Sean Kelly
Post by Bill Baxter
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.
I'd certainly be willing to do that sort of thing in my code to make the
problem go away in the short term. In my case all of those setAttr
calls would be hidden away in library code, so it wouldn't make life any
harder for everyday use.
But it's not much use as a workaround until Tango is actually
downloadable somewhere. :-)
Once Tango is available, though, that would certainly be better than the
alternative of rewriting my library to use malloc/free. If that's going
to be necessary then I think I'll just as soon go back to C++ where at
least I can use smart pointers. The integrated GC is the #1 reason I'm
using D in the first place, and if I can't realistically use that then I
might as well be using C++.
--bb
I feel the same way. Without GC, D is just C++ with a few more features.
I do not think that this is even remotely valid. Think of everything you
arent doing in D. You arent hacking macros with the preprocessor, you
arent declaring methods outside of classes, globally overloading
operators... To say that D is C++ + GC + some other features is just
plain ignorant.
It's not ignorant from my viewpoint. You're assuming I care about the features you
described.
Jarrett Billingsley
2007-01-10 23:10:49 UTC
Permalink
"%u" <duser at hotmail.com> wrote in message
Post by %u
It's not ignorant from my viewpoint. You're assuming I care about the features you
described.
A bit OT, but would you please get a newsreader so you can stop being called
"%u"?
Kyle Furlong
2007-01-10 23:33:12 UTC
Permalink
Post by %u
== Quote from Kyle Furlong (kylefurlong at gmail.com)'s article
Post by Kyle Furlong
Post by %u
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Post by Sean Kelly
Post by Bill Baxter
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.
I'd certainly be willing to do that sort of thing in my code to make the
problem go away in the short term. In my case all of those setAttr
calls would be hidden away in library code, so it wouldn't make life any
harder for everyday use.
But it's not much use as a workaround until Tango is actually
downloadable somewhere. :-)
Once Tango is available, though, that would certainly be better than the
alternative of rewriting my library to use malloc/free. If that's going
to be necessary then I think I'll just as soon go back to C++ where at
least I can use smart pointers. The integrated GC is the #1 reason I'm
using D in the first place, and if I can't realistically use that then I
might as well be using C++.
--bb
I feel the same way. Without GC, D is just C++ with a few more features.
I do not think that this is even remotely valid. Think of everything you
arent doing in D. You arent hacking macros with the preprocessor, you
arent declaring methods outside of classes, globally overloading
operators... To say that D is C++ + GC + some other features is just
plain ignorant.
It's not ignorant from my viewpoint. You're assuming I care about the features you
described.
What I'm trying to say is, that to say a language is just a sum of its
features is incorrect. There are behaviors that arise from the
interaction of features that can be good or bad. In D, I've found that
these interactions are on the whole good. In C++, I've found that these
interactions are on the whole bad.

I'm not trying to say your experiences with both are invalid, maybe you
have been using them in a way in which the opposite is true. But for me,
C++ as a whole doesn't feel right, and D as a whole does.
Bill Baxter
2007-01-11 00:36:50 UTC
Permalink
Post by Kyle Furlong
Post by %u
== Quote from Kyle Furlong (kylefurlong at gmail.com)'s article
Post by Kyle Furlong
Post by %u
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Post by Sean Kelly
Post by Bill Baxter
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.
I'd certainly be willing to do that sort of thing in my code to make the
problem go away in the short term. In my case all of those setAttr
calls would be hidden away in library code, so it wouldn't make life any
harder for everyday use.
But it's not much use as a workaround until Tango is actually
downloadable somewhere. :-)
Once Tango is available, though, that would certainly be better than the
alternative of rewriting my library to use malloc/free. If that's going
to be necessary then I think I'll just as soon go back to C++ where at
least I can use smart pointers. The integrated GC is the #1 reason I'm
using D in the first place, and if I can't realistically use that then I
might as well be using C++.
--bb
I feel the same way. Without GC, D is just C++ with a few more features.
I do not think that this is even remotely valid. Think of everything you
arent doing in D. You arent hacking macros with the preprocessor, you
arent declaring methods outside of classes, globally overloading
operators... To say that D is C++ + GC + some other features is just
plain ignorant.
It's not ignorant from my viewpoint. You're assuming I care about the features you
described.
What I'm trying to say is, that to say a language is just a sum of its
features is incorrect. There are behaviors that arise from the
interaction of features that can be good or bad. In D, I've found that
these interactions are on the whole good. In C++, I've found that these
interactions are on the whole bad.
I'm not trying to say your experiences with both are invalid, maybe you
have been using them in a way in which the opposite is true. But for me,
C++ as a whole doesn't feel right, and D as a whole does.
Yes, that's certainly a valid point. But I think maybe you're taking
Mr. Percent a little too literally.

All I was trying to say originally was that, to me, D without garbage
collection -- despite all the other nice features -- is not compelling
enough to abandon C++.

Also I think Mr. Percent and I would both agree that whether or not the
language "feels right" is less important than being able to get our work
done in a timely manner. I'm interested in D primarily because I
believe it will help me get things done faster. Right now, I think with
GC and all the other nice D features, for the kinds of things I want to
do, D and C++ are just about neck-and-neck. What C++ lacks vs D, it
makes up for with great debuggers and tools, and gobs of free libraries.
Take away the GC from D, and what's left is simply not enough to
overcome the advantages of C++'s debuggers, tools, and libraries *for
the kinds of things I want to do*. Your mileage may vary.

--bb
Kyle Furlong
2007-01-11 01:12:02 UTC
Permalink
Post by Bill Baxter
Post by Kyle Furlong
Post by %u
== Quote from Kyle Furlong (kylefurlong at gmail.com)'s article
Post by Kyle Furlong
Post by %u
== Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
Post by Bill Baxter
Post by Sean Kelly
Post by Bill Baxter
This seems to be a very big issue. The GC seems to be pretty much
useless right now if you're going to have a lot of floating point data
in your app.
For what it's worth, I ran the test above with the modified GC in Tango,
for 10000 iterations of the "while(1)" loop. The default behavior
roughly matched Phobos, with an 89 second run time and over 340MB of
memory consumed and growing steadily. Then I told the GC to not scan
gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
A test with these changes in place dropped the run time to 7 seconds
with 43MB of memory consumed and not growing.
I grant that this isn't quite as nice as if D just figured out whether
to scan the block using TypeInfo, but at least it grants the programmer
a way to customize GC behavior somewhat to tune application performance.
The only stipulation with the current implementation is that block
attributes will not be preserved if an array is resized. This isn't
terribly difficult to fix, but I haven't done so yet.
I'd certainly be willing to do that sort of thing in my code to make the
problem go away in the short term. In my case all of those setAttr
calls would be hidden away in library code, so it wouldn't make life any
harder for everyday use.
But it's not much use as a workaround until Tango is actually
downloadable somewhere. :-)
Once Tango is available, though, that would certainly be better than the
alternative of rewriting my library to use malloc/free. If that's going
to be necessary then I think I'll just as soon go back to C++ where at
least I can use smart pointers. The integrated GC is the #1 reason I'm
using D in the first place, and if I can't realistically use that then I
might as well be using C++.
--bb
I feel the same way. Without GC, D is just C++ with a few more features.
I do not think that this is even remotely valid. Think of everything you
arent doing in D. You arent hacking macros with the preprocessor, you
arent declaring methods outside of classes, globally overloading
operators... To say that D is C++ + GC + some other features is just
plain ignorant.
It's not ignorant from my viewpoint. You're assuming I care about the features you
described.
What I'm trying to say is, that to say a language is just a sum of its
features is incorrect. There are behaviors that arise from the
interaction of features that can be good or bad. In D, I've found that
these interactions are on the whole good. In C++, I've found that
these interactions are on the whole bad.
I'm not trying to say your experiences with both are invalid, maybe
you have been using them in a way in which the opposite is true. But
for me, C++ as a whole doesn't feel right, and D as a whole does.
Yes, that's certainly a valid point. But I think maybe you're taking
Mr. Percent a little too literally.
All I was trying to say originally was that, to me, D without garbage
collection -- despite all the other nice features -- is not compelling
enough to abandon C++.
Also I think Mr. Percent and I would both agree that whether or not the
language "feels right" is less important than being able to get our work
done in a timely manner. I'm interested in D primarily because I
believe it will help me get things done faster. Right now, I think with
GC and all the other nice D features, for the kinds of things I want to
do, D and C++ are just about neck-and-neck. What C++ lacks vs D, it
makes up for with great debuggers and tools, and gobs of free libraries.
Take away the GC from D, and what's left is simply not enough to
overcome the advantages of C++'s debuggers, tools, and libraries *for
the kinds of things I want to do*. Your mileage may vary.
--bb
I agree. This is only good news for D, however, because libraries and
tools come with time. D is still young.
Oskar Linde
2007-01-09 19:09:15 UTC
Permalink
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
I feel kind of bad for making it sound like this is a problem related
specifically to the D garbage collector. It is rather a general and
apparently well known problem of all conservative garbage collectors.
The D garbage collector is still excellent for a large domain of problems.

Lots of people seem to be having similar problems though, so a better
understanding of under what conditions a conservative garbage collector
will and will not work seems to be called for.

I'd guess that a lot of problems we have with the conservative GC is
because we are at the end of the life time of the 32-bit address space.
As time has passed, our programs have used a larger and larger portion
of the available memory space. As one anonymous %u said, one solution
would be "increasing the amount of memory needed for pointers by the
factor two", which I interpret as moving to a 64-bit memory space.

I did some calculations to better understand what is going on. I've
probably done several errors though.

<theoretical rambling>
The probability of one single uniformly random spurious pointer
preventing the collection of an object of size s is

p = s/B,

with B being the size of the address space (2^b for b = 32 or 64 bits).
On a 64 bit architecture, the probability is obviously drastically reduced.

if g is the amount (bytes) of independent random data an application
needs, the risk of an object of size s not being collected is

P(g) = 1 - (1 - s/B) ^ (g/(b/8))

As one can see, the risk increases considerably as the objects gets
bigger. Increasing the amount of random data the application contains,
reduces the size of objects the GC will be able to handle satisfactory.

My example was kind of nasty in that it not only accumulated additional
random garbage in each iteration, but also caused heap fragmentation. A
kinder example would always create objects of the same size. The simpler
example would be:

* start with a static set of g bytes of random data
* for each iteration create n objects of size s (for simplicity, let
each object contain s bytes of random data)
* after each iteration, none of the n objects are used anymore. This is
a good time to call gc.fullCollect();

After each such iteration, i, P_i*n_i objects will remain uncollected,
and will be added to the pool of random spurious pointers. I disregard
the small portion of objects that are uncollected because of spurious
pointers appearing in other objects left uncollected in the same
iteration. If this portion would be significant, you'd really have a
problem anyway and this will show up in later iterations.

g_{i+1} ~ g_i + P_i*n_i*s.

I generously assume that the safe memory area of the collected objects
are reused by the allocator in the next iteration. A few of those will
become unsafe by the recently added spurious pointers from uncollected
objects, but for the most part, those memory areas are now safe, and the
objects allocated/collected from them can be disregarded. The number of
objects that need to be allocated in unsafe memory areas for the next
iteration becomes:

n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)

This will of course not perfectly predict the real behavior, as it
relaxes the problem away from discrete units, but should still
adequately model the sample application. Using this, I tried to find the
breaking point, i.e. at what size objects need to be explicitly freed
instead of left to the GC.

"Verification":
Rewriting my original example to to a std.gc.fullCollect() every 10th
iteration, giving the following parameters:
g = 20 MB
n = 10

the model suggests allocating objects of size:
s = 2000 would result in almost no space overhead, stable at 20 mb
s = 3000 would result in a slight but stable space overhead,
s = 4000 would cause a run-away memory usage

running the program results in:
s = 2000 memory usage is stable at 20 mb
s = 3000 results in a small apparently unbounded memory leak
s = 4000 results in a unbounded memory leak

The model appears to be at least in the correct ballpark for this sample.

Theoretical results:
Those tables show the maximum object size the GC will be able to handle
with different static amounts of "random" data. By "being able to
handle", I mean that the application doesn't use more than 2x the
required amount of memory. The exact breaking point is very unstable and
going above it rapidly results in uncontrolled memory consumption.

32 bit arch, 100 objects

1 MB data 8000 bytes
5 MB data 4500 bytes
10 MB data 3000 bytes
20 MB data 2000 bytes
100 MB data 700 bytes
500 MB data 200 bytes
1000 MB data 100 bytes

32 bit arch, 1000 objects

1 MB data 3400 bytes
5 MB data 2200 bytes
10 MB data 1700 bytes
20 MB data 1200 bytes
100 MB data 500 bytes
500 MB data 150 bytes
1000 MB data 100 bytes

32 bit arch, 10000 objects
1 MB data 1300 bytes
5 MB data 1000 bytes
10 MB data 800 bytes
20 MB data 600 bytes
100 MB data 300 bytes
500 MB data 100 bytes
1000 MB data 75 bytes

Those figures need to be taken with a couple of grains of salt, but
should give a indication of at what object size one needs to manually
handle object lifetimes.

As a comparison -- the 64 bit haven:

64 bit arch, 100 objects
2 GB data 1.5 GB
100 GB data 600 MB
1 TB data 200 MB

64 bit arch, 1000 objects
2 GB data 350 MB
100 GB data 250 MB
1 TB data 150 MB

64 bit arch, 10000 objects
2 GB data 100 MB
100 GB data 75 MB
1 TB data 50 MB
</theoretical rambling>

Summary:

As far as I can see, what I have to do to avoid memory leaks with a
conservative GC, is one of the following:

1. move to a 64 bit architecture
2. manually handle all objects larger than a few hundred bytes (see above)
3. hide all non pointer data from the GC

It is a shame #2 is such a pain and that D doesn't offer any help such
as automatic ref-counting. Not having automatic ref-counting also
prevents neat solutions such as transparent CoW, and automatic handling
of scarce resources.

If I wouldn't have a strong belief that automatic ref-counting would be
addressed soon, I'd definitely consider going back to C++. Luckily,
before I'd give up waiting, 64 bit architectures will probably be in
great majority. ;)

/Oskar
Kyle Furlong
2007-01-09 22:16:09 UTC
Permalink
Post by Oskar Linde
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
I feel kind of bad for making it sound like this is a problem related
specifically to the D garbage collector. It is rather a general and
apparently well known problem of all conservative garbage collectors.
The D garbage collector is still excellent for a large domain of problems.
Lots of people seem to be having similar problems though, so a better
understanding of under what conditions a conservative garbage collector
will and will not work seems to be called for.
I'd guess that a lot of problems we have with the conservative GC is
because we are at the end of the life time of the 32-bit address space.
As time has passed, our programs have used a larger and larger portion
of the available memory space. As one anonymous %u said, one solution
would be "increasing the amount of memory needed for pointers by the
factor two", which I interpret as moving to a 64-bit memory space.
I did some calculations to better understand what is going on. I've
probably done several errors though.
<theoretical rambling>
The probability of one single uniformly random spurious pointer
preventing the collection of an object of size s is
p = s/B,
with B being the size of the address space (2^b for b = 32 or 64 bits).
On a 64 bit architecture, the probability is obviously drastically reduced.
if g is the amount (bytes) of independent random data an application
needs, the risk of an object of size s not being collected is
P(g) = 1 - (1 - s/B) ^ (g/(b/8))
As one can see, the risk increases considerably as the objects gets
bigger. Increasing the amount of random data the application contains,
reduces the size of objects the GC will be able to handle satisfactory.
My example was kind of nasty in that it not only accumulated additional
random garbage in each iteration, but also caused heap fragmentation. A
kinder example would always create objects of the same size. The simpler
* start with a static set of g bytes of random data
* for each iteration create n objects of size s (for simplicity, let
each object contain s bytes of random data)
* after each iteration, none of the n objects are used anymore. This is
a good time to call gc.fullCollect();
After each such iteration, i, P_i*n_i objects will remain uncollected,
and will be added to the pool of random spurious pointers. I disregard
the small portion of objects that are uncollected because of spurious
pointers appearing in other objects left uncollected in the same
iteration. If this portion would be significant, you'd really have a
problem anyway and this will show up in later iterations.
g_{i+1} ~ g_i + P_i*n_i*s.
I generously assume that the safe memory area of the collected objects
are reused by the allocator in the next iteration. A few of those will
become unsafe by the recently added spurious pointers from uncollected
objects, but for the most part, those memory areas are now safe, and the
objects allocated/collected from them can be disregarded. The number of
objects that need to be allocated in unsafe memory areas for the next
n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)
This will of course not perfectly predict the real behavior, as it
relaxes the problem away from discrete units, but should still
adequately model the sample application. Using this, I tried to find the
breaking point, i.e. at what size objects need to be explicitly freed
instead of left to the GC.
Rewriting my original example to to a std.gc.fullCollect() every 10th
g = 20 MB
n = 10
s = 2000 would result in almost no space overhead, stable at 20 mb
s = 3000 would result in a slight but stable space overhead,
s = 4000 would cause a run-away memory usage
s = 2000 memory usage is stable at 20 mb
s = 3000 results in a small apparently unbounded memory leak
s = 4000 results in a unbounded memory leak
The model appears to be at least in the correct ballpark for this sample.
Those tables show the maximum object size the GC will be able to handle
with different static amounts of "random" data. By "being able to
handle", I mean that the application doesn't use more than 2x the
required amount of memory. The exact breaking point is very unstable and
going above it rapidly results in uncontrolled memory consumption.
32 bit arch, 100 objects
1 MB data 8000 bytes
5 MB data 4500 bytes
10 MB data 3000 bytes
20 MB data 2000 bytes
100 MB data 700 bytes
500 MB data 200 bytes
1000 MB data 100 bytes
32 bit arch, 1000 objects
1 MB data 3400 bytes
5 MB data 2200 bytes
10 MB data 1700 bytes
20 MB data 1200 bytes
100 MB data 500 bytes
500 MB data 150 bytes
1000 MB data 100 bytes
32 bit arch, 10000 objects
1 MB data 1300 bytes
5 MB data 1000 bytes
10 MB data 800 bytes
20 MB data 600 bytes
100 MB data 300 bytes
500 MB data 100 bytes
1000 MB data 75 bytes
Those figures need to be taken with a couple of grains of salt, but
should give a indication of at what object size one needs to manually
handle object lifetimes.
64 bit arch, 100 objects
2 GB data 1.5 GB
100 GB data 600 MB
1 TB data 200 MB
64 bit arch, 1000 objects
2 GB data 350 MB
100 GB data 250 MB
1 TB data 150 MB
64 bit arch, 10000 objects
2 GB data 100 MB
100 GB data 75 MB
1 TB data 50 MB
</theoretical rambling>
As far as I can see, what I have to do to avoid memory leaks with a
1. move to a 64 bit architecture
2. manually handle all objects larger than a few hundred bytes (see above)
3. hide all non pointer data from the GC
It is a shame #2 is such a pain and that D doesn't offer any help such
as automatic ref-counting. Not having automatic ref-counting also
prevents neat solutions such as transparent CoW, and automatic handling
of scarce resources.
If I wouldn't have a strong belief that automatic ref-counting would be
addressed soon, I'd definitely consider going back to C++. Luckily,
before I'd give up waiting, 64 bit architectures will probably be in
great majority. ;)
/Oskar
Dont hold your breath for a reference counting implementation from DMD.
Walter doesnt want to add any overhead to assignments.
Johan Granberg
2007-01-09 22:28:20 UTC
Permalink
Post by Kyle Furlong
Post by Oskar Linde
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
I feel kind of bad for making it sound like this is a problem related
specifically to the D garbage collector. It is rather a general and
apparently well known problem of all conservative garbage collectors.
The D garbage collector is still excellent for a large domain of problems.
Lots of people seem to be having similar problems though, so a better
understanding of under what conditions a conservative garbage collector
will and will not work seems to be called for.
I'd guess that a lot of problems we have with the conservative GC is
because we are at the end of the life time of the 32-bit address space.
As time has passed, our programs have used a larger and larger portion
of the available memory space. As one anonymous %u said, one solution
would be "increasing the amount of memory needed for pointers by the
factor two", which I interpret as moving to a 64-bit memory space.
I did some calculations to better understand what is going on. I've
probably done several errors though.
<theoretical rambling>
The probability of one single uniformly random spurious pointer
preventing the collection of an object of size s is
p = s/B,
with B being the size of the address space (2^b for b = 32 or 64 bits).
On a 64 bit architecture, the probability is obviously drastically reduced.
if g is the amount (bytes) of independent random data an application
needs, the risk of an object of size s not being collected is
P(g) = 1 - (1 - s/B) ^ (g/(b/8))
As one can see, the risk increases considerably as the objects gets
bigger. Increasing the amount of random data the application contains,
reduces the size of objects the GC will be able to handle satisfactory.
My example was kind of nasty in that it not only accumulated additional
random garbage in each iteration, but also caused heap fragmentation. A
kinder example would always create objects of the same size. The simpler
* start with a static set of g bytes of random data
* for each iteration create n objects of size s (for simplicity, let
each object contain s bytes of random data)
* after each iteration, none of the n objects are used anymore. This is
a good time to call gc.fullCollect();
After each such iteration, i, P_i*n_i objects will remain uncollected,
and will be added to the pool of random spurious pointers. I disregard
the small portion of objects that are uncollected because of spurious
pointers appearing in other objects left uncollected in the same
iteration. If this portion would be significant, you'd really have a
problem anyway and this will show up in later iterations.
g_{i+1} ~ g_i + P_i*n_i*s.
I generously assume that the safe memory area of the collected objects
are reused by the allocator in the next iteration. A few of those will
become unsafe by the recently added spurious pointers from uncollected
objects, but for the most part, those memory areas are now safe, and the
objects allocated/collected from them can be disregarded. The number of
objects that need to be allocated in unsafe memory areas for the next
n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)
This will of course not perfectly predict the real behavior, as it
relaxes the problem away from discrete units, but should still
adequately model the sample application. Using this, I tried to find the
breaking point, i.e. at what size objects need to be explicitly freed
instead of left to the GC.
Rewriting my original example to to a std.gc.fullCollect() every 10th
g = 20 MB
n = 10
s = 2000 would result in almost no space overhead, stable at 20 mb
s = 3000 would result in a slight but stable space overhead,
s = 4000 would cause a run-away memory usage
s = 2000 memory usage is stable at 20 mb
s = 3000 results in a small apparently unbounded memory leak
s = 4000 results in a unbounded memory leak
The model appears to be at least in the correct ballpark for this sample.
Those tables show the maximum object size the GC will be able to handle
with different static amounts of "random" data. By "being able to
handle", I mean that the application doesn't use more than 2x the
required amount of memory. The exact breaking point is very unstable and
going above it rapidly results in uncontrolled memory consumption.
32 bit arch, 100 objects
1 MB data 8000 bytes
5 MB data 4500 bytes
10 MB data 3000 bytes
20 MB data 2000 bytes
100 MB data 700 bytes
500 MB data 200 bytes
1000 MB data 100 bytes
32 bit arch, 1000 objects
1 MB data 3400 bytes
5 MB data 2200 bytes
10 MB data 1700 bytes
20 MB data 1200 bytes
100 MB data 500 bytes
500 MB data 150 bytes
1000 MB data 100 bytes
32 bit arch, 10000 objects
1 MB data 1300 bytes
5 MB data 1000 bytes
10 MB data 800 bytes
20 MB data 600 bytes
100 MB data 300 bytes
500 MB data 100 bytes
1000 MB data 75 bytes
Those figures need to be taken with a couple of grains of salt, but
should give a indication of at what object size one needs to manually
handle object lifetimes.
64 bit arch, 100 objects
2 GB data 1.5 GB
100 GB data 600 MB
1 TB data 200 MB
64 bit arch, 1000 objects
2 GB data 350 MB
100 GB data 250 MB
1 TB data 150 MB
64 bit arch, 10000 objects
2 GB data 100 MB
100 GB data 75 MB
1 TB data 50 MB
</theoretical rambling>
As far as I can see, what I have to do to avoid memory leaks with a
1. move to a 64 bit architecture
2. manually handle all objects larger than a few hundred bytes (see
above) 3. hide all non pointer data from the GC
It is a shame #2 is such a pain and that D doesn't offer any help such
as automatic ref-counting. Not having automatic ref-counting also
prevents neat solutions such as transparent CoW, and automatic handling
of scarce resources.
If I wouldn't have a strong belief that automatic ref-counting would be
addressed soon, I'd definitely consider going back to C++. Luckily,
before I'd give up waiting, 64 bit architectures will probably be in
great majority. ;)
/Oskar
Dont hold your breath for a reference counting implementation from DMD.
Walter doesnt want to add any overhead to assignments.
Wouldn't it be possible to have classes declared "refcounted" in the same
way as scope? Then the extra cost would only apply to members of thouse
classes, possibly by disallowing uppcasts to non refcounted superclasses.

example:

refcounted class foo{
}

...

void bar(){
foo f=new foo();//refcount is one
auto k=f;//refcount is two
k=null;//back to one
//here the refcount reaches zero and the object is deleted
}
Thomas Kuehne
2007-01-09 23:04:41 UTC
Permalink
<snip>
Post by Kyle Furlong
Post by Oskar Linde
If I wouldn't have a strong belief that automatic ref-counting would be
addressed soon, I'd definitely consider going back to C++. Luckily,
before I'd give up waiting, 64 bit architectures will probably be in
great majority. ;)
Dont hold your breath for a reference counting implementation from DMD.
Walter doesnt want to add any overhead to assignments.
Reference counting implementations would cause a lot of pain and
overhead with array slices. A sweeping GC that has different
pools for non-pointer and may-contain-pointer memory areas is easier to
implement and causes less overhead. In addition reference counting can
cause interesting problems in multi-core systems.
Yes, using advance information all sweeping GC's can be exploited,
though user data should'nt usually be a problem as most of it ends up
in the no-pointer pool. The x86-stack however is a potential attack vector.

Thomas
Sean Kelly
2007-01-10 17:56:43 UTC
Permalink
Post by Oskar Linde
Post by Oskar Linde
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
I feel kind of bad for making it sound like this is a problem related
specifically to the D garbage collector. It is rather a general and
apparently well known problem of all conservative garbage collectors.
The D garbage collector is still excellent for a large domain of problems.
Lots of people seem to be having similar problems though, so a better
understanding of under what conditions a conservative garbage collector
will and will not work seems to be called for.
This link may be relevant:

http://www.hpl.hp.com/techreports/2001/HPL-2001-251.html

"Bounding Space Usage of Conservative Garbage Collectors" - Hans J. Boehm


Sean
Benji Smith
2007-01-11 01:55:14 UTC
Permalink
The only comment I'd like to add is this:

At the same time as we're thinking about how to use a precise collector
instead of a conservative collector, we should also think about how to
implement a generational copying collector, since it might be necessary
to add new constructs to the core language (pointer pinning, for
example) to facilitate the new collector semantics.

--benji
Sean Kelly
2007-01-11 03:19:36 UTC
Permalink
Post by Benji Smith
At the same time as we're thinking about how to use a precise collector
instead of a conservative collector, we should also think about how to
implement a generational copying collector, since it might be necessary
to add new constructs to the core language (pointer pinning, for
example) to facilitate the new collector semantics.
Personally, I'm not convinced that a traditional moving GC will be the
way to go in D. Unless we are given some way to obtain type info for
data on the stack and in the static data segment, anything directly
referenced by these regions would have to be implicitly pinned. And
typically, long-lived data is referenced directly from such links. A
programmer could work around this limitation by using proxy classes, but
this seems like a lot of trouble just to allow for generational garbage
collection. However, there are variants on the idea that sound like
they have potential, one of which I believe is being discussed. As for
pointer pinning--it's fairly easy to add to the GC, so I don't see that
as an obstacle to future design. The bigger problem is sorting out
efficient implementations of Object.opHash and such for objects whose
only persistent uniquely identifying characteristic is their address.


Sean

Loading...