Discussion:
GC, the simple solution
Frank Benoit
2006-06-04 10:05:02 UTC
Permalink
Another garbage collector thread :)

The best garbage collector I can think about, is a reference counting
one. To implement this:

* Every object needs a reference counter field.
* Code using references, need to have code for incrementing/decrementing
the counters.
* Destructors are called immediatly if the counter goes zero.

What you get is a precise, incremental, realtime garbage collector.
* all logic behaviour is deterministic
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
* implement games and realtime apps without worry about lags
* implement secure application without worry about GC attack by valid
refence data values.
* consume only the needed memory
* no more need for the auto object
Daniel Keep
2006-06-04 10:40:51 UTC
Permalink
"GC, the simple solution"

No such thing.

Summary:

* Ref counting by default: good grief no!
* Ref counting as an option: good grief yes!
* Is there a perfect solution for memory management: don't be ridiculous.
Post by Frank Benoit
Another garbage collector thread :)
The best garbage collector I can think about, is a reference counting
* Every object needs a reference counter field.
* Code using references, need to have code for incrementing/decrementing
the counters.
* Destructors are called immediatly if the counter goes zero.
What you get is a precise, incremental, realtime garbage collector.
* all logic behaviour is deterministic
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
* implement games and realtime apps without worry about lags
* implement secure application without worry about GC attack by valid
refence data values.
* consume only the needed memory
* no more need for the auto object
What you also get that you might not want:

* because references are on the object, you can't have array slices
* it's also incompatible with pointers: if the pointer is to a member of
an object, how on earth do you update the refcount then?
* slower in the immediate sense (incref & decref calls, plus it'll mean
less code in L1 and L2 CPU cache, which can kill performance in some
architectures)
* requires all objects to have destructors to decref child objects
(implicit or otherwise)
* without auto, makes it impossible to ensure an object's destruction
when you leave scope: you can leak references
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be valid

Those last two are the main problem with reference counting, along with
the others and slightly added complexity in the compiler.

Python, however, does not suffer the cycle problem. Know why?

Because they have a *mark & sweep* garbage collector to clean up the cycles.

Seriously, I don't want to see the current GC replaced by a reference
counting scheme. The current system is just fine for the majority of
applications. Where it falls down are real-time systems which can't
allow for the non-deterministic GC pauses, and where you want to keep
your memory profile to a minimum. Anything that has pauses of activity
can just run a background GC thread to keep the memory footprint down.

To this end, I think implementing an incremental or concurrent GC should
be the next port of call for optimising D's memory management.

As it stands, you can implement reference counting yourself: you just
need to write an allocator, and call the incref and decref functions
manually. However, I do think that having built-in *support* for
Reference counting would be fantastic. The less that stuff like this is
left in the hands of the programmer, the better.

What I think we all need to keep in mind is that NO memory management
scheme is good for all cases. There will always be some weird corner
case which doesn't work well with our scheme of choice, and that we
don't give a rat's about, but is VERY important to someone else.

So far D does automatic GC (the best default, IMHO [1]), manual
management, and RAII. Throw optional ref counting in to the mix, and I
think we'll have pretty much 99% of cases covered.

-- Daniel

[1] The best default since it's unobtrusive, and *safe*: unless you're
doing strange things, it's very hard to leave dangling references with a
GC. This allows people to get on with writing the damn program, and
worry about managing memory later (if ever).
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Frank Benoit
2006-06-04 13:43:10 UTC
Permalink
In nearly every other GC implementation you will need also a read or
write barrier. With reference counting you have a write barriere.

Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
non-generational, conservative Phobos GC implementation is an exception.

But a write barrier alone doesn't make ref counting ridiculous. Well you
are right, it is not the solution for all and everything, but I think a
big step in the right direction. And mark&sweep GC should be the option.
Post by Daniel Keep
* because references are on the object, you can't have array slices
Don't understand that point.
Post by Daniel Keep
* it's also incompatible with pointers: if the pointer is to a member of
an object, how on earth do you update the refcount then?
In the actual implememtation of the gc, every object's memory range is
registered by the gc. With that you can get the objects refcounter address.
Post by Daniel Keep
* slower in the immediate sense (incref & decref calls, plus it'll mean
less code in L1 and L2 CPU cache, which can kill performance in some
architectures)
Performance has two sides. Throughput is one thing, worst latency the
other. We can think about techniques to optimize unecessary
increments/decrements.
BTW all incremental GCs will also need read and/or write barriers, which
will have the same perfomance issues.
Post by Daniel Keep
* requires all objects to have destructors to decref child objects
(implicit or otherwise)
Yes, all objects having child objects. But if you think, that destructor
calls are a performance issue, you will get problems with the mark&sweep
also.
Post by Daniel Keep
* without auto, makes it impossible to ensure an object's destruction
when you leave scope: you can leak references
How can that happen, if we have them native? Well ok, e.g. if you store
the reference to a static variable, then it is not freed. But this is
really a programmers bug. But I think that is something an advanced look
at it will find solutions. First we have to decide to go for ref counting :)
Post by Daniel Keep
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be valid
Right, here a manual ref=null is needed. And perhaps a mark&sweep
algorithm to detect such cycles in the debug version, called on request.
So if you know you program, you can avoid cycles, or you can break them
before releasing the last reference.

Doing a realtime GC seems to be not possible. The existing "realtime GC"
seems to me, incremental ones, which stop the world for a short limited
time. And that time goes into your worst latency. So if I want to do
things with a latency less 100us, there is no chance to implement a GC.
Ref counting whould be a solution, but this can only work if the whole
program does not depend on "Stop-the-world".
Post by Daniel Keep
As it stands, you can implement reference counting yourself: you just
need to write an allocator, and call the incref and decref functions
manually. However, I do think that having built-in *support* for
Reference counting would be fantastic. The less that stuff like this is
left in the hands of the programmer, the better.
No, i can't.
* The dereference operator cannot be overwritten.
* Existing classes, native arrays etc. does not work like this.
I do not need a few classes managed with ref counting. I want to avoid
the "Stop the world". And phobos needs it.

Where is the proof that ref counting is so slow? In mark&sweep you have
to run over the whole memory. If the memory is low, you have to do that
often. This is really slow than.
Daniel Keep
2006-06-05 01:15:51 UTC
Permalink
Post by Frank Benoit
In nearly every other GC implementation you will need also a read or
write barrier. With reference counting you have a write barriere.
Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
non-generational, conservative Phobos GC implementation is an exception.
But a write barrier alone doesn't make ref counting ridiculous. Well you
are right, it is not the solution for all and everything, but I think a
big step in the right direction. And mark&sweep GC should be the option.
I guess we'll just have to disagree, then :)

Like I said before, I think the current GC should be the default since,
even if it's not the best in terms of latency, it's the *safest*.
There's no need for someone coding under it to worry about things like
cycles, etc. By having refcounting an option, we can say to
programmers: "turn this on if you like, but watch your back for cycles!"

<joking>Plus, the current system gives us bitchin' benchmarks with wc!
;)</joking>
Post by Frank Benoit
Post by Daniel Keep
* because references are on the object, you can't have array slices
Don't understand that point.
An array slice is a pointer and a length. The problem is the "pointer"
Post by Frank Benoit
Post by Daniel Keep
* it's also incompatible with pointers: if the pointer is to a member of
an object, how on earth do you update the refcount then?
In the actual implememtation of the gc, every object's memory range is
registered by the gc. With that you can get the objects refcounter address.
Aaah, ok: you're putting the refcount somewhere in the allocated chunk.
Sorry, I thought you meant hide it as a member on objects.

However, I imagine that this is still going to do horrible things to
pointer operations in terms of speed. Every time you move or copy a
pointer, you'd have to ask the GC "ok, find which pool this pointer is
located in, then inc/dec the refcount."
Post by Frank Benoit
Post by Daniel Keep
* slower in the immediate sense (incref & decref calls, plus it'll mean
less code in L1 and L2 CPU cache, which can kill performance in some
architectures)
Performance has two sides. Throughput is one thing, worst latency the
other. We can think about techniques to optimize unecessary
increments/decrements.
BTW all incremental GCs will also need read and/or write barriers, which
will have the same perfomance issues.
I concede that, as well as that the current GC has (probably) the worst
all-at-once latency you can get.
Post by Frank Benoit
Post by Daniel Keep
* requires all objects to have destructors to decref child objects
(implicit or otherwise)
Yes, all objects having child objects. But if you think, that destructor
calls are a performance issue, you will get problems with the mark&sweep
also.
Well, I don't *need* destructors in most general objects under
mark&sweep. It's only when they're holding some external resource, and
that's when I usually manage them with auto or scoped destruction.

... which would admittedly be easier with refcounting, if a little bit
slower :)
Post by Frank Benoit
Post by Daniel Keep
* without auto, makes it impossible to ensure an object's destruction
when you leave scope: you can leak references
How can that happen, if we have them native? Well ok, e.g. if you store
the reference to a static variable, then it is not freed. But this is
really a programmers bug. But I think that is something an advanced look
at it will find solutions. First we have to decide to go for ref counting :)
What I mean is that if I say "auto Foo x = ..." then I know that come
hell or high water, D is going to destroy that object at the end of
scope. Plus, because of auto rules, I *cannot* leak a reference to that
object: D simply won't let me.

As for the static variable case, it's no different to the current
implementation: you save a reference to something in a global or static
variable, and it's never going to die :)
Post by Frank Benoit
Post by Daniel Keep
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be valid
Right, here a manual ref=null is needed. And perhaps a mark&sweep
algorithm to detect such cycles in the debug version, called on request.
So if you know you program, you can avoid cycles, or you can break them
before releasing the last reference.
The problem I have is that even if I *do* "know" my program, I don't
want to have to worry about this. I just want to write my damn code!
This is one of the great advantages of D: if you don't give a rat's
about memory management, you don't have to.

That said, D *does* have one advantage: at compile time, the compiler
can determine if it's possible for an object to indirectly refer to
itself. Using this, it can insert something like this into the decref:

# static if( typeof(ptr).canPointToItself )
# registerForCycleChecking(ptr);

That way, you know that cycles will get checked for *eventually*.
Post by Frank Benoit
Doing a realtime GC seems to be not possible.
Oh, you'll get no argument from me THERE. :)
Post by Frank Benoit
The existing "realtime GC"
seems to me, incremental ones, which stop the world for a short limited
time. And that time goes into your worst latency. So if I want to do
things with a latency less 100us, there is no chance to implement a GC.
Ref counting whould be a solution, but this can only work if the whole
program does not depend on "Stop-the-world".
I saw an interesting paper once that was talking about using a
tri-colour mark&sweep collector that was fully concurrent: sounded
heinously complicated to write, but could run a full collect in the
background of your app.

Buggered if I can find the damn thing again, though...
Post by Frank Benoit
Post by Daniel Keep
As it stands, you can implement reference counting yourself: you just
need to write an allocator, and call the incref and decref functions
manually. However, I do think that having built-in *support* for
Reference counting would be fantastic. The less that stuff like this is
left in the hands of the programmer, the better.
No, i can't.
* The dereference operator cannot be overwritten.
MyRef.value

Yeah, it's not that great, but at least it's *possible* :)
Post by Frank Benoit
* Existing classes, native arrays etc. does not work like this.
Fair enough.
Post by Frank Benoit
I do not need a few classes managed with ref counting. I want to avoid
the "Stop the world". And phobos needs it.
I don't think phobos "needs" it. What I think would help immensely was
if phobos was easier to rebuild with different compiler flags,
collector, etc.

Would you be happy if it was trivial to recompile phobos using
refcounting instead of the current GC, and then use that?

# build -phobos -collector=refcount -version=SomethingElse
# build MyProggy -collector=refcount -- uses phobos_refcount.lib
Post by Frank Benoit
Where is the proof that ref counting is so slow? In mark&sweep you have
to run over the whole memory. If the memory is low, you have to do that
often. This is really slow than.
I don't have any proof on me, although knowing the internet I'm sure I
could find some "proof" without too much trouble :P

And yes; if memory is low, m&s will probably take a long time.
Actually, I should find out how long that is. Would be interesting to
know exactly how long it takes to collect 512 meg of garbage :P

Personally, I don't mind how D's default collector is implemented, so
long as it is efficient in the general case, and I don't have to know
anything about the way it works in order to write code.

That said, I'd still like to see D support collectors other than the
base GC. Ideally, I think D should support at least:

# -collector=simple -- current style: no special support needed
# -collector=writebarrier -- calls std.gc.writeBarrier(ptr) on writes
# -collector=refcount -- calls std.gc.writeBarrier(ptr), as well as
# std.gc.incref and std.gc.decref as
# appropriate.

That way, we can all have our cake, be it chocolate cake, fruit cake or
cheesecake. Plus, we even get to *eat* it :)

-- Daniel
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Lars Ivar Igesund
2006-06-05 09:04:58 UTC
Permalink
Post by Daniel Keep
Post by Frank Benoit
In nearly every other GC implementation you will need also a read or
write barrier. With reference counting you have a write barriere.
Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
non-generational, conservative Phobos GC implementation is an exception.
But a write barrier alone doesn't make ref counting ridiculous. Well you
are right, it is not the solution for all and everything, but I think a
big step in the right direction. And mark&sweep GC should be the option.
I guess we'll just have to disagree, then :)
Like I said before, I think the current GC should be the default since,
even if it's not the best in terms of latency, it's the *safest*.
There's no need for someone coding under it to worry about things like
cycles, etc.
Well, actually, the current GC is kinda bad with cycles. Try to make a
circular linked list where you keep a single additional reference to one of
the elements. Then remove the reference and collect. In theory this memory
should now "just" be lost/leaked, but you might also experience crashes. I
think Chris Miller had some interesting revelations on the subject.
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
Larry Evans
2006-06-05 21:27:23 UTC
Permalink
Post by Daniel Keep
"GC, the simple solution"
[snip]
Post by Daniel Keep
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be valid
I think someone else in the thread mentioned this as a programmer error.
However, assuming the programmer intended this, then this programmer
must expect that the MS GC would break the cycle somewhere before
calling the destructors and he must not care where the cycle was
broken, because otherwise he would have used a weak pointer
to close the cycle where he wanted the break to occur.

Anyway, code for doing this GC breaking of cycles in RC
is at:

http://tinyurl.com/reuzl

See code after comment:

* Breaks all arcs in cycle_arcs.

As mentioned in the code comments, this is Christopher's
method for collecting cycles and suffers a delay since it
keeps all smart ptrs in a list and processes the entire list.
Other methods don't keep this list but do a local mark-sweep
at each rc decrement with obvious time penalties. A compromise
just uses a queue which, when it reaches a certain size,
plays the same role as Christopher's list of all smart pointers.
Post by Daniel Keep
Those last two are the main problem with reference counting, along with
the others and slightly added complexity in the compiler.
Python, however, does not suffer the cycle problem. Know why?
Because they have a *mark & sweep* garbage collector to clean up the cycles.
Then it must not automatically call destructors since then it would
suffer the 2nd problem above.
[snip]
Post by Daniel Keep
As it stands, you can implement reference counting yourself: you just
need to write an allocator, and call the incref and decref functions
manually. However, I do think that having built-in *support* for
Reference counting would be fantastic. The less that stuff like this is
left in the hands of the programmer, the better.
Smart pointers would be a great help here, but since
D doesn't allow user definition of the assignment operator:

http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html

that's not easy. I'm a newbie, so what's the best way to implement
something like a smart pointer without having to manually:

increment rc1;
decrement rc2;
assign the pointer to its target location;
Daniel Keep
2006-06-08 15:07:19 UTC
Permalink
Sorry for the late (and short) reply: exams breathing down my neck and
all :)
Post by Larry Evans
Post by Daniel Keep
"GC, the simple solution"
[snip]
Post by Daniel Keep
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be
valid
I think someone else in the thread mentioned this as a programmer error.
However, assuming the programmer intended this, then this programmer
must expect that the MS GC would break the cycle somewhere before
calling the destructors and he must not care where the cycle was
broken, because otherwise he would have used a weak pointer
to close the cycle where he wanted the break to occur.
Anyway, code for doing this GC breaking of cycles in RC
http://tinyurl.com/reuzl
* Breaks all arcs in cycle_arcs.
As mentioned in the code comments, this is Christopher's
method for collecting cycles and suffers a delay since it
keeps all smart ptrs in a list and processes the entire list.
Other methods don't keep this list but do a local mark-sweep
at each rc decrement with obvious time penalties. A compromise
just uses a queue which, when it reaches a certain size,
plays the same role as Christopher's list of all smart pointers.
Post by Daniel Keep
Those last two are the main problem with reference counting, along with
the others and slightly added complexity in the compiler.
Python, however, does not suffer the cycle problem. Know why?
Because they have a *mark & sweep* garbage collector to clean up the
cycles.
Then it must not automatically call destructors since then it would
suffer the 2nd problem above.
I haven't read the above link since I really don't need to be distracted
any more than I already am at the moment. I'll read that link later on
once I'm not quite so swamped :)

However, I will say this: if you have a cycle in Python, it will clean
it up normally PROVIDED destructors are not involved. If they are, then
it just moves the whole cycle to a dead object list, since there is no
safe way to break the cycle. At that point, it's up to the programmer
to deal with it.
Post by Larry Evans
[snip]
Post by Daniel Keep
As it stands, you can implement reference counting yourself: you just
need to write an allocator, and call the incref and decref functions
manually. However, I do think that having built-in *support* for
Reference counting would be fantastic. The less that stuff like this is
left in the hands of the programmer, the better.
Smart pointers would be a great help here, but since
http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html
that's not easy. I'm a newbie, so what's the best way to implement
increment rc1;
decrement rc2;
assign the pointer to its target location;
Not really, sorry. Your best bet might be with a preprocessor. I've
had plans to write an AST-based preprocessor for a while, but that's off
in the far, far future. Maybe give Build 3.0 a look? In that case,
you'd want to write a macro like:

rc2 #= rc1;
==>
(rc1).incref(); (rc2).decref(); rc2 = (rc1);

Again, sorry for the terse reply.

-- Daniel
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Lionello Lunesu
2006-06-04 12:39:34 UTC
Permalink
Post by Frank Benoit
* Destructors are called immediatly if the counter goes zero.
This is not such a good thing! You don't really know who releases the last
reference, so you don't really know when the object is deleted. If the
object is being referenced from multiple threads, you don't know in what
thread the destructor will run.
Post by Frank Benoit
* all logic behaviour is deterministic
See above.
Post by Frank Benoit
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
When an object is being destructed, the references it has to other objects
are being released. So in the destructor of object B, object A might have
released about half its references.
Post by Frank Benoit
* consume only the needed memory
What about circular references? They'll never get released, no clean-up.

L.
Frank Benoit
2006-06-04 13:52:52 UTC
Permalink
Post by Lionello Lunesu
Post by Frank Benoit
* Destructors are called immediatly if the counter goes zero.
This is not such a good thing! You don't really know who releases the last
reference, so you don't really know when the object is deleted. If the
object is being referenced from multiple threads, you don't know in what
thread the destructor will run.
Where is the disadvantage? If you release concurrent in different
threads you need some kind of syncronization. Than you know it. But this
is no issue of ref counting.
If you release a reference in the phobos GC, the object is collected
"perhaps" the next time. If there are no valid integer values,
"pointing"to your object. And after collecting, the sequence of
destructor calls is not defined.
Post by Lionello Lunesu
Post by Frank Benoit
* all logic behaviour is deterministic
See above.
See above.
Post by Lionello Lunesu
Post by Frank Benoit
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
When an object is being destructed, the references it has to other objects
are being released. So in the destructor of object B, object A might have
released about half its references.
If object B Dtor is called, its member variables are valid, and valid is
the object A. The B.~this() can do something with A. After the Dtor is
finished, the reference to A is nulled, and A is perhaps also deleted.
Seams perfect to me.
Post by Lionello Lunesu
Post by Frank Benoit
* consume only the needed memory
What about circular references? They'll never get released, no clean-up.
Yes, they need a solution. Perhaps a fall back mark&sweep collector, to
check for cicles.
Post by Lionello Lunesu
L.
Lionello Lunesu
2006-06-04 14:19:46 UTC
Permalink
"Frank Benoit" <keinfarbton at nospam.xyz> wrote in message
Post by Frank Benoit
Post by Lionello Lunesu
Post by Frank Benoit
* Destructors are called immediatly if the counter goes zero.
This is not such a good thing! You don't really know who releases the last
reference, so you don't really know when the object is deleted. If the
object is being referenced from multiple threads, you don't know in what
thread the destructor will run.
Where is the disadvantage? If you release concurrent in different
threads you need some kind of syncronization. Than you know it. But this
is no issue of ref counting.
If you release a reference in the phobos GC, the object is collected
"perhaps" the next time. If there are no valid integer values,
"pointing"to your object. And after collecting, the sequence of
destructor calls is not defined.
There are many resources that must be released by the thread by which they
were acquired. There's not much synchronisation can do. If the last
reference just happens to be released in a thread other than the one that
created the object, then the resource will be released in the wrong thread.

As I've understood from Boehm's paper, one advantages of a GC is to fix this
issue: a GC collect will call finalizers for the objects that were created
in that thread. It's like keeping an object from being destructed until the
thread that created the object calls malloc (which is were the GC does its
thing).

Anyway, Boehm's paper is a good read. He (she?) also addresses the problems
with reference counting.

http://www.hpl.hp.com/personal/Hans_Boehm/gc/
Post by Frank Benoit
Post by Lionello Lunesu
Post by Frank Benoit
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
When an object is being destructed, the references it has to other objects
are being released. So in the destructor of object B, object A might have
released about half its references.
If object B Dtor is called, its member variables are valid, and valid is
the object A. The B.~this() can do something with A. After the Dtor is
finished, the reference to A is nulled, and A is perhaps also deleted.
Seams perfect to me.
I meant my example like this: object A has a reference to object B. During
the destruction of A, the references it has to other objects are released,
causing some objects being destructed, like B. Now B might (indirectly) use
data from A, but A's in the middle of destruction. What I mean to tell with
all this is that you still have to be carefull with your destructors.
Especially when using a strong-pointer template. You can't always tell what
state other objects are in in a destructor. They might be half way during
destruction, having released some, but not all, of their references.

L.
Frank Benoit
2006-06-04 14:41:20 UTC
Permalink
Post by Lionello Lunesu
As I've understood from Boehm's paper, one advantages of a GC is to fix this
issue: a GC collect will call finalizers for the objects that were created
in that thread. It's like keeping an object from being destructed until the
thread that created the object calls malloc
This is not true for the phobos GC. The GC runs in the thread which
calls new or gc.fullCollect.
But the Dtors are not called while other threads are running, because
the GC pauses all other threads. This is called the "Stop-the-world".
Post by Lionello Lunesu
I meant my example like this: object A has a reference to object B. During
the destruction of A, the references it has to other objects are released,
causing some objects being destructed, like B. Now B might (indirectly) use
data from A, but A's in the middle of destruction. What I mean to tell with
all this is that you still have to be carefull with your destructors.
Especially when using a strong-pointer template. You can't always tell what
state other objects are in in a destructor. They might be half way during
destruction, having released some, but not all, of their references.
Object A cannot be destroyed, if it is reachable directly or indirectly.
That is, the ref counting is for. So if A.~this() is called, nobody else
has a reference to A.
Lars Ivar Igesund
2006-06-04 15:29:30 UTC
Permalink
Post by Frank Benoit
Another garbage collector thread :)
The best garbage collector I can think about, is a reference counting
* Every object needs a reference counter field.
* Code using references, need to have code for incrementing/decrementing
the counters.
* Destructors are called immediatly if the counter goes zero.
What you get is a precise, incremental, realtime garbage collector.
* all logic behaviour is deterministic
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
* implement games and realtime apps without worry about lags
* implement secure application without worry about GC attack by valid
refence data values.
* consume only the needed memory
* no more need for the auto object
Not entirely true in the general case, Frank ;)

There are always techniques to get around problems with an algorithm,
depending on what you want to achieve, but in general, a reference counted
GC is said to have these _negative_ properties: (paraphrased from Garbage
Collection: Algorithms for Automatic Dynamic Memory Management by Jones and
Lins)

* The cost of removing the last pointer to an object is unbounded
* Even if the general latency is good, the total overhead of adjusting
references is significantly greater than that of a tracing GC
* It has a substantial space overhead, making it less useful for smaller
heaps
* Cyclic references (already mentioned)

I think these makes RC less suited for the general case, whereas it might
fit very well in a tightly controlled system where low latency is required
(which I'm well aware that you need ;).

IMO, the best general GC (the default) would be one which can handle as many
cases as possible, and which is manipulable in those (preferably known)
cases which it don't handle out of the box. I think that one need to
combine features from several GC techniques (possibly also RC) to achieve
this.
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
Frank Benoit
2006-06-05 08:46:14 UTC
Permalink
Post by Lars Ivar Igesund
* The cost of removing the last pointer to an object is unbounded
delete can be unbound? I didn't know that.
Post by Lars Ivar Igesund
* Even if the general latency is good, the total overhead of adjusting
references is significantly greater than that of a tracing GC
* It has a substantial space overhead, making it less useful for smaller
heaps
In codesize yes, but in heap size?
I though the tracing GC does need much more heap because it should be
called very seldom.
Post by Lars Ivar Igesund
* Cyclic references (already mentioned)
I think these makes RC less suited for the general case, whereas it might
fit very well in a tightly controlled system where low latency is required
(which I'm well aware that you need ;).
IMO, the best general GC (the default) would be one which can handle as many
cases as possible, and which is manipulable in those (preferably known)
cases which it don't handle out of the box. I think that one need to
combine features from several GC techniques (possibly also RC) to achieve
this.
agreed.
Lars Ivar Igesund
2006-06-05 09:39:07 UTC
Permalink
Post by Frank Benoit
Post by Lars Ivar Igesund
* The cost of removing the last pointer to an object is unbounded
delete can be unbound? I didn't know that.
The reason is that a delete might trigger an unknown additional amount of
decrefs leading to more deletes.
Post by Frank Benoit
Post by Lars Ivar Igesund
* It has a substantial space overhead, making it less useful for smaller
heaps
In codesize yes, but in heap size?
I though the tracing GC does need much more heap because it should be
called very seldom.
Hmm, might have paraphrased somewhat wildly there, but the space overhead
_is_ considered a drawback, and reducing it will most likely lead to more
computation overhead instead.
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
Sean Kelly
2006-06-05 16:03:12 UTC
Permalink
Post by Lars Ivar Igesund
Post by Frank Benoit
Post by Lars Ivar Igesund
* The cost of removing the last pointer to an object is unbounded
delete can be unbound? I didn't know that.
The reason is that a delete might trigger an unknown additional amount of
decrefs leading to more deletes.
And some allocators colaesce adjacent free blocks aggressively (ie.
during delete). Not to mention the fact that the allocator data is
probably protected by a mutex, etc. In general, if the allocator
documentation makes no guarantees about progress then you may at least
be stuck on a mutex while other (potentially low-priority) threads are
making an (unbounded) allocation.


Sean
Walter Bright
2006-06-04 21:42:35 UTC
Permalink
Post by Frank Benoit
The best garbage collector I can think about, is a reference counting
one.
Reference counting has its advantages, but some severe disadvantages:

1) cyclical data structures won't get free'd

2) every pointer copy requires an increment and a corresponding
decrement - including when simply passing a reference to a function

3) in a multithreaded app, the incs and decs must be synchronized

4) exception handlers (finally's) must be inserted to handle all the
decs so there are no leaks. Contrary to assertions otherwise, there is
no such thing as "zero overhead exceptions."

5) in order to support slicing and interior pointers, as well as
supporting reference counting on arbitrary allocations of non-object
data, a separate "wrapper" object must be allocated for each allocation
to be ref counted. This essentially doubles the number of allocations
needed.

6) The wrapper object will mean that all pointers will need to be
double-dereferenced to access the data.

7) Fixing the compiler to hide all this stuff from the programmer will
make it difficult to interface cleanly with C.

8) Ref counting can fragment the heap thereby consuming more memory just
like the gc can, though the gc typically will consume more memory overall.

9) Ref counting does not eliminated latency problems, it just reduces them.

The proposed C++ shared_ptr<>, which implements ref counting, suffers
from all these faults. I haven't seen a heads up benchmark of
shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
turned out to be a significant loser in terms of both performance and
memory consumption.

That said, I'd like for D to optionally support some form of ref
counting, as rc is better for managing scarce resources like file handles.
Frank Benoit
2006-06-05 01:02:40 UTC
Permalink
Ok, i see now that my "best and simplest" solution is neiter of both :)
But I don't want to throw the idea away so fast. Please lets discuss it
a bit further.

Let's think about the scenario of having reference counting (RC) and
mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
Python. If Python uses this, the idea cannot be so crazy.
Post by Walter Bright
1) cyclical data structures won't get free'd
Combine RC and MS. So cycles are detected. It should be possible to
avoid cycles at all, if you don't want to have any MS pause.
Post by Walter Bright
2) every pointer copy requires an increment and a corresponding
decrement - including when simply passing a reference to a function
This is true for a smart pointer class. But is it really true for a
compiler supported RC?
If I call a func/method, I hold a valid reference to the object. While
this reference is copied down the stack, these are only temporary
reference. And all of them are release before the call returns to the
caller. You can ignore them all for RC. Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough? (Don't know anything about the
performance of the asm "lock" instruction)
Post by Walter Bright
4) exception handlers (finally's) must be inserted to handle all the
decs so there are no leaks. Contrary to assertions otherwise, there is
no such thing as "zero overhead exceptions."
Yes, this is additional overhead in code size and runtime. And there are
additional post destructors needed, to set all member refs to null,
which wasn't nulled by the dtor.
Post by Walter Bright
5) in order to support slicing and interior pointers, as well as
supporting reference counting on arbitrary allocations of non-object
data, a separate "wrapper" object must be allocated for each allocation
to be ref counted. This essentially doubles the number of allocations
needed.
The MS GC allready has a memory registry. Isn't there stored, where an
object begins? Let the ref counter be in this registry. This should work
with all types and slicing and object member references.
Post by Walter Bright
6) The wrapper object will mean that all pointers will need to be
double-dereferenced to access the data.
=> 5.)
Post by Walter Bright
7) Fixing the compiler to hide all this stuff from the programmer will
make it difficult to interface cleanly with C.
hm, ok. Here is some more manual work necessary.
If you give the last reference to a C lib, isn't this a problem for the
MS GC also?
Post by Walter Bright
8) Ref counting can fragment the heap thereby consuming more memory just
like the gc can, though the gc typically will consume more memory overall.
Isn't the actual GC implementation also fragmenting the heap?
But I don't have any idea to this issue.
Post by Walter Bright
9) Ref counting does not eliminated latency problems, it just reduces them.
Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete. But it does not need to stop all
other threads. e.g. hardware interrupts don't need to be disabled.
Post by Walter Bright
The proposed C++ shared_ptr<>, which implements ref counting, suffers
from all these faults. I haven't seen a heads up benchmark of
shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
turned out to be a significant loser in terms of both performance and
memory consumption.
That said, I'd like for D to optionally support some form of ref
counting, as rc is better for managing scarce resources like file handles.
As I said, I would be happy to be able to switch the whole runtime
system to RC. If I would loose 50% performance this is acceptable for me
if I get latency < 100us.

Frank Benoit
Daniel Keep
2006-06-05 02:21:54 UTC
Permalink
Just a few comments, since you invoked my name :P
Post by Frank Benoit
Let's think about the scenario of having reference counting (RC) and
mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
Python. If Python uses this, the idea cannot be so crazy.
Just wanted to point out that if I remember correctly (and I *could* be
wrong), the cycle checking is only done on decrefs, since that's the
only time that cycles become a problem.
Post by Frank Benoit
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough? (Don't know anything about the
performance of the asm "lock" instruction)
They definitely have to be synched: because in decref you've got to:

1. ref--;
2. if( ref == 0 ) free(ptr);
3. if( obj.canHaveCycles ) registerForCycleCheck(obj);

If not more. You then also need the incs synched with that, otherwise
you could end up "ref++"ing in the middle of "free(ptr)".
Post by Frank Benoit
Post by Walter Bright
7) Fixing the compiler to hide all this stuff from the programmer will
make it difficult to interface cleanly with C.
hm, ok. Here is some more manual work necessary.
If you give the last reference to a C lib, isn't this a problem for the
MS GC also?
MS' GC doesn't use refcounting (I assume you're talking about .NET,
btw). In fact, to solve this problem, it uses pinning:

"""
When the runtime marshaler sees that your code is passing to native code
a reference to a managed reference object, it automatically pins the
object. What this means is that the object is put in a special state
where the garbage collector will neither move the object in memory nor
remove the object from memory. ...

When the native function returns, the marshaled object is automatically
unpinned. Automatic pinning is very convenient, but it raises another
question. What happens if the native function caches the pointer for use
later on? When the function returns, won't the collector be free to move
the object? The answer is yes, and the solution for your code in such a
situation is to manually pin the object using the
System.Runtime.InteropServices.GCHandle type.
"""

-- http://msdn.microsoft.com/msdnmag/issues/04/10/NET/
Post by Frank Benoit
Post by Walter Bright
8) Ref counting can fragment the heap thereby consuming more memory just
like the gc can, though the gc typically will consume more memory overall.
Isn't the actual GC implementation also fragmenting the heap?
But I don't have any idea to this issue.
Not as badly. The current GC is pretty good at keeping fragmentation to
a minimum. Fragmentation happens when you allocate lots of little
objects, then free some but not all, and end up with teeny-tiny "holes"
in memory that are hard to fill.

I think Walter said this because of his comment on using "wrapper"
objects to implement the references: you'd end up with lots of "holes"
in memory.
Post by Frank Benoit
Post by Walter Bright
9) Ref counting does not eliminated latency problems, it just reduces them.
Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete. But it does not need to stop all
other threads. e.g. hardware interrupts don't need to be disabled.
I think he means that it doesn't make latency just disappear; it spreads
it over time. It's the old "drop a frog into hot water and it'll jump
out, drop it in cold water and then boil it and it won't notice" anecdote.

-- Daniel
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Lionello Lunesu
2006-06-05 05:39:25 UTC
Permalink
Post by Daniel Keep
1. ref--;
2. if( ref == 0 ) free(ptr);
3. if( obj.canHaveCycles ) registerForCycleCheck(obj);
If not more. You then also need the incs synched with that, otherwise
you could end up "ref++"ing in the middle of "free(ptr)".
The thread doing the ++ apparently has a pointer to the object without a
reference. How could that ever happen?

L.
Daniel Keep
2006-06-05 05:52:19 UTC
Permalink
Post by Lionello Lunesu
Post by Daniel Keep
1. ref--;
2. if( ref == 0 ) free(ptr);
3. if( obj.canHaveCycles ) registerForCycleCheck(obj);
If not more. You then also need the incs synched with that, otherwise
you could end up "ref++"ing in the middle of "free(ptr)".
The thread doing the ++ apparently has a pointer to the object without a
reference. How could that ever happen?
L.
*slaps forehead*

Of course; you're right. I was thinking about race conditions, but
obviously got confused.

Please refer to my signature. Specifically the "it may not even make
sense." part :P

-- Daniel
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Frank Benoit
2006-06-05 08:06:23 UTC
Permalink
Post by Daniel Keep
If not more. You then also need the incs synched with that, otherwise
you could end up "ref++"ing in the middle of "free(ptr)".
As allready pointed out by Lionello, if the counter goes zero there is
no other manipulator.
Post by Daniel Keep
Post by Frank Benoit
Post by Walter Bright
7) Fixing the compiler to hide all this stuff from the programmer will
make it difficult to interface cleanly with C.
hm, ok. Here is some more manual work necessary.
If you give the last reference to a C lib, isn't this a problem for the
MS GC also?
MS=mark&sweep :)
Walter Bright
2006-06-05 03:06:24 UTC
Permalink
Post by Frank Benoit
Ok, i see now that my "best and simplest" solution is neiter of both :)
But I don't want to throw the idea away so fast. Please lets discuss it
a bit further.
I'm not throwing it away so fast, I've thought about it for several
years <g>.
Post by Frank Benoit
Let's think about the scenario of having reference counting (RC) and
mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
Python. If Python uses this, the idea cannot be so crazy.
Maybe, maybe not. Python isn't known for its speed. Java uses gc, not rc.
Post by Frank Benoit
Post by Walter Bright
1) cyclical data structures won't get free'd
Combine RC and MS. So cycles are detected. It should be possible to
avoid cycles at all, if you don't want to have any MS pause.
There are ways to avoid the RC cycle problem, but I don't know how
expensive they are.
Post by Frank Benoit
Post by Walter Bright
2) every pointer copy requires an increment and a corresponding
decrement - including when simply passing a reference to a function
This is true for a smart pointer class. But is it really true for a
compiler supported RC?
Yes.
Post by Frank Benoit
If I call a func/method, I hold a valid reference to the object. While
this reference is copied down the stack, these are only temporary
reference. And all of them are release before the call returns to the
caller. You can ignore them all for RC.
This fails if the passed reference 'escapes'. It can escape by being
assigned to a global, being assigned to a class member, being inserted
into some data structure that lives longer than the function, and being
returned by the function.
Post by Frank Benoit
Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough?
That requires synchronizing.
Post by Frank Benoit
(Don't know anything about the
performance of the asm "lock" instruction)
It's slow enough to be noticable.
Post by Frank Benoit
Post by Walter Bright
4) exception handlers (finally's) must be inserted to handle all the
decs so there are no leaks. Contrary to assertions otherwise, there is
no such thing as "zero overhead exceptions."
Yes, this is additional overhead in code size and runtime. And there are
additional post destructors needed, to set all member refs to null,
which wasn't nulled by the dtor.
Post by Walter Bright
5) in order to support slicing and interior pointers, as well as
supporting reference counting on arbitrary allocations of non-object
data, a separate "wrapper" object must be allocated for each allocation
to be ref counted. This essentially doubles the number of allocations
needed.
The MS GC allready has a memory registry. Isn't there stored, where an
object begins? Let the ref counter be in this registry. This should work
with all types and slicing and object member references.
Yes, you can find out the beginning of an arbitrary pointer's memory
block. But that isn't a cheap operation.
Post by Frank Benoit
Post by Walter Bright
6) The wrapper object will mean that all pointers will need to be
double-dereferenced to access the data.
=> 5.)
Finding the start of the memory chunk will be several times more
expensive than the double indirect.
Post by Frank Benoit
Post by Walter Bright
7) Fixing the compiler to hide all this stuff from the programmer will
make it difficult to interface cleanly with C.
hm, ok. Here is some more manual work necessary.
If you give the last reference to a C lib, isn't this a problem for the
MS GC also?
I don't know anything about the MS GC.
Post by Frank Benoit
Post by Walter Bright
8) Ref counting can fragment the heap thereby consuming more memory just
like the gc can, though the gc typically will consume more memory overall.
Isn't the actual GC implementation also fragmenting the heap?
Yes. malloc will fragment the heap, too. But only GC has the hope of
doing compaction.
Post by Frank Benoit
But I don't have any idea to this issue.
Post by Walter Bright
9) Ref counting does not eliminated latency problems, it just reduces them.
Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete.
That's the same with gc. Only upon an allocation is the time unbound.
Post by Frank Benoit
But it does not need to stop all
other threads. e.g. hardware interrupts don't need to be disabled.
GC doesn't require disabling hardware interrupts. It does not need to
stop any threads that do not hold references to GC allocated data.
Post by Frank Benoit
Post by Walter Bright
The proposed C++ shared_ptr<>, which implements ref counting, suffers
from all these faults. I haven't seen a heads up benchmark of
shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
turned out to be a significant loser in terms of both performance and
memory consumption.
That said, I'd like for D to optionally support some form of ref
counting, as rc is better for managing scarce resources like file handles.
As I said, I would be happy to be able to switch the whole runtime
system to RC. If I would loose 50% performance this is acceptable for me
if I get latency < 100us.
Latency cannot be guaranteed even with malloc().

I don't know what constraints you have on the system you're developing.
But when I've written real time interrupt code, the ISRs were written to
not make any OS calls or any allocations. All they'd do is just gather
data and post it to a FIFO buffer. A non-realtime thread would then pull
the data out of the buffer and process it. The system would work without
dropping data as long as the average (not maximum) processing time per
datum was less than the data acquisition rate.
Frank Benoit
2006-06-05 08:29:54 UTC
Permalink
Post by Walter Bright
I'm not throwing it away so fast, I've thought about it for several
years <g>.
I meant myself, hehe.
Post by Walter Bright
Post by Frank Benoit
If I call a func/method, I hold a valid reference to the object. While
this reference is copied down the stack, these are only temporary
reference. And all of them are release before the call returns to the
caller. You can ignore them all for RC.
This fails if the passed reference 'escapes'. It can escape by being
assigned to a global, being assigned to a class member, being inserted
into some data structure that lives longer than the function, and being
returned by the function.
Yes, and the compiler can catch these cases. With incr/decr the
parameter reference you win nothing. So it should be possible to
optimize them.
Post by Walter Bright
Post by Frank Benoit
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough?
That requires synchronizing.
Post by Frank Benoit
(Don't know anything about the
performance of the asm "lock" instruction)
It's slow enough to be noticable.
Yes, you can find out the beginning of an arbitrary pointer's memory
block. But that isn't a cheap operation.
yikes, you are absolutely right.
So here comes the more ugly solution:
Every pointer is allways a struct with the pointer to the data and a
pointer to the refcounter.
Post by Walter Bright
Post by Frank Benoit
Post by Walter Bright
7) Fixing the compiler to hide all this stuff from the programmer will
make it difficult to interface cleanly with C.
hm, ok. Here is some more manual work necessary.
If you give the last reference to a C lib, isn't this a problem for the
MS GC also?
I don't know anything about the MS GC.
you do :) MS=mark&sweep
Post by Walter Bright
Post by Frank Benoit
Post by Walter Bright
9) Ref counting does not eliminated latency problems, it just reduces them.
Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete.
That's the same with gc. Only upon an allocation is the time unbound.
And additional the time is unbound for the GC fullcollect...
Post by Walter Bright
Post by Frank Benoit
But it does not need to stop all
other threads. e.g. hardware interrupts don't need to be disabled.
GC doesn't require disabling hardware interrupts. It does not need to
stop any threads that do not hold references to GC allocated data.
If you want to have you ISR written in D, you have two choices:
a) be sure that you do not move a reference (refb=refa; refa=null;).
This can end up in GC deleting a living object. If you cannot guarantee
this, than
b) disable the interrupting code also while the GC is running. But this
is not acceptable, so go back to a) or have another GC.
Post by Walter Bright
Latency cannot be guaranteed even with malloc().
I don't know what constraints you have on the system you're developing.
But when I've written real time interrupt code, the ISRs were written to
not make any OS calls or any allocations. All they'd do is just gather
data and post it to a FIFO buffer. A non-realtime thread would then pull
the data out of the buffer and process it. The system would work without
dropping data as long as the average (not maximum) processing time per
datum was less than the data acquisition rate.
Yes, I will also not do any OS call, neither I would do a "new". And I
can easily check the code for not doing this. I even can do a check into
the GC allocator, that it is not called in a ISR.
But a moving reference can corrupt the working GC. And there is no way
to check for that.
Walter Bright
2006-06-05 10:08:04 UTC
Permalink
Post by Frank Benoit
Post by Walter Bright
This fails if the passed reference 'escapes'. It can escape by being
assigned to a global, being assigned to a class member, being inserted
into some data structure that lives longer than the function, and being
returned by the function.
Yes, and the compiler can catch these cases. With incr/decr the
parameter reference you win nothing. So it should be possible to
optimize them.
It can only do it if it has full source available. It also won't work
for virtual functions, as it is not known at compile time what code will
be called.
Post by Frank Benoit
Every pointer is allways a struct with the pointer to the data and a
pointer to the refcounter.
It's the wrapper approach, with two allocations per allocation.
Post by Frank Benoit
Post by Walter Bright
Post by Frank Benoit
Post by Walter Bright
7) Fixing the compiler to hide all this stuff from the programmer will
make it difficult to interface cleanly with C.
hm, ok. Here is some more manual work necessary.
If you give the last reference to a C lib, isn't this a problem for the
MS GC also?
I don't know anything about the MS GC.
you do :) MS=mark&sweep
I thought you meant MS=Microsoft <g>. No, it isn't a problem, as it
scans the C data segment and stack, too.
Post by Frank Benoit
Post by Walter Bright
Post by Frank Benoit
Post by Walter Bright
9) Ref counting does not eliminated latency problems, it just reduces them.
Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete.
That's the same with gc. Only upon an allocation is the time unbound.
And additional the time is unbound for the GC fullcollect...
The collector only runs during an allocation request - which means you
can completely control when a collection can happen.
Post by Frank Benoit
Post by Walter Bright
Post by Frank Benoit
But it does not need to stop all
other threads. e.g. hardware interrupts don't need to be disabled.
GC doesn't require disabling hardware interrupts. It does not need to
stop any threads that do not hold references to GC allocated data.
a) be sure that you do not move a reference (refb=refa; refa=null;).
This can end up in GC deleting a living object. If you cannot guarantee
this, than
b) disable the interrupting code also while the GC is running. But this
is not acceptable, so go back to a) or have another GC.
c) code the ISR so it does not reference any gc allocated data. For
example, use malloc() for dynamic data the ISR needs.

d) make sure any gc references used by the ISR have another reference to
them that the gc will scan (such as in the global data segment).
Kevin Bealer
2006-06-07 17:18:33 UTC
Permalink
In article <e60vmu$1b6t$1 at digitaldaemon.com>, Walter Bright says...
..
Post by Walter Bright
Post by Frank Benoit
Post by Walter Bright
Post by Frank Benoit
Post by Walter Bright
9) Ref counting does not eliminated latency problems, it just reduces them.
Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete.
That's the same with gc. Only upon an allocation is the time unbound.
And additional the time is unbound for the GC fullcollect...
The collector only runs during an allocation request - which means you
can completely control when a collection can happen.
Correct me if I'm wrong, but while this is true for M + S, doesn't RC /kind of/
have the opposite problem? In refcounting systems, allocation would be mostly
predictable, a la malloc, but dereferences can trigger a long chain of other
dereferences.

I say 'kind of' because it is more deterministic than GC - you can always dig
deeper to see what is going to be freed by a given dereference.

For instance, if you are the last person to let go of a hash table, you have to
wait for the whole thing to unravel, along with any objects stored inside.

I guess a real-time RC implementation could be made by keeping a 'recycling bin'
and only taking apart N objects at a time, i.e. incremental freeing. This would
un-guarantee the prompt running of destructors though (which seems to be one of
the big rationales for RC in D, right, at least for non-memory objects?)

Kevin
Bruno Medeiros
2006-06-07 19:03:15 UTC
Permalink
Post by Walter Bright
Post by Frank Benoit
Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough?
That requires synchronizing.
Are you sure? It says in
http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
"In a multi-threaded system, if threads are preemptively scheduled,
reference counting requires additional care to ensure that the updates
to reference counts are atomic. This is straightforward using hardware
primitives such as fetch-and-add, compare-and-swap, or
load-linked/store-conditional, but it is definitely necessary to pay
attention to this detail." ?
--
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sean Kelly
2006-06-07 19:19:53 UTC
Permalink
Post by Bruno Medeiros
Post by Walter Bright
Post by Frank Benoit
Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough?
That requires synchronizing.
Are you sure? It says in
"In a multi-threaded system, if threads are preemptively scheduled,
reference counting requires additional care to ensure that the updates
to reference counts are atomic. This is straightforward using hardware
primitives such as fetch-and-add, compare-and-swap, or
load-linked/store-conditional, but it is definitely necessary to pay
attention to this detail." ?
These are synchronizing operations.


Sean
Bruno Medeiros
2006-06-10 16:26:28 UTC
Permalink
Post by Sean Kelly
Post by Bruno Medeiros
Post by Walter Bright
Post by Frank Benoit
Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough?
That requires synchronizing.
Are you sure? It says in
"In a multi-threaded system, if threads are preemptively scheduled,
reference counting requires additional care to ensure that the updates
to reference counts are atomic. This is straightforward using hardware
primitives such as fetch-and-add, compare-and-swap, or
load-linked/store-conditional, but it is definitely necessary to pay
attention to this detail." ?
These are synchronizing operations.
Sean
By "That requires synchronizing." I thought Walter meant stuff with
mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware
synchronizing operations be called "atomic inc/dec" ?
--
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sean Kelly
2006-06-10 18:39:35 UTC
Permalink
Post by Bruno Medeiros
Post by Sean Kelly
Post by Bruno Medeiros
Post by Walter Bright
Post by Frank Benoit
Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough?
That requires synchronizing.
Are you sure? It says in
"In a multi-threaded system, if threads are preemptively scheduled,
reference counting requires additional care to ensure that the
updates to reference counts are atomic. This is straightforward using
hardware primitives such as fetch-and-add, compare-and-swap, or
load-linked/store-conditional, but it is definitely necessary to pay
attention to this detail." ?
These are synchronizing operations.
By "That requires synchronizing." I thought Walter meant stuff with
mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware
synchronizing operations be called "atomic inc/dec" ?
For the most part. There are really two issues here. First, the
operation must be atomic with respect to other threads. The x86
actually guarantees this for all load/store operations on aligned data
<= the bus size. Second, the operations must be observed in the proper
order by other threads--ie. the compiler and all involved CPUs (reader
and writer) are not allowed to reorder the operations or do any other
fancy tricks that may result in a load/store order that isn't the same
as program order. For the compiler, this is where volatile comes in,
and at the hardware level, a synchronizing operation may be required
depending on the situation and on the hardware involved. Again, with
x86 you can get away without a LOCK a lot of the time, but not always.
So you really must assume that any competing operation has a worst case
cost of... well, a lot.


Sean
Bruno Medeiros
2006-06-12 18:16:54 UTC
Permalink
Post by Sean Kelly
Post by Bruno Medeiros
Post by Sean Kelly
Post by Bruno Medeiros
Post by Walter Bright
Post by Frank Benoit
Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough?
That requires synchronizing.
Are you sure? It says in
"In a multi-threaded system, if threads are preemptively scheduled,
reference counting requires additional care to ensure that the
updates to reference counts are atomic. This is straightforward
using hardware primitives such as fetch-and-add, compare-and-swap,
or load-linked/store-conditional, but it is definitely necessary to
pay attention to this detail." ?
These are synchronizing operations.
By "That requires synchronizing." I thought Walter meant stuff with
mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware
synchronizing operations be called "atomic inc/dec" ?
For the most part. There are really two issues here. First, the
operation must be atomic with respect to other threads. The x86
actually guarantees this for all load/store operations on aligned data
<= the bus size. Second, the operations must be observed in the proper
order by other threads--ie. the compiler and all involved CPUs (reader
and writer) are not allowed to reorder the operations or do any other
fancy tricks that may result in a load/store order that isn't the same
as program order. For the compiler, this is where volatile comes in,
and at the hardware level, a synchronizing operation may be required
depending on the situation and on the hardware involved. Again, with
x86 you can get away without a LOCK a lot of the time, but not always.
So you really must assume that any competing operation has a worst case
cost of... well, a lot.
Sean
Ah, I see. I had recently read about relative out-of-order execution
problems in the Memory Barrier wikipedia entry, and I got the impression
(out of nowhere) that the hardware took care of that transparently
(i.e., without program intervention), but then, that is not the case,
not allways at least?
--
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sean Kelly
2006-06-14 02:40:07 UTC
Permalink
Post by Bruno Medeiros
Ah, I see. I had recently read about relative out-of-order execution
problems in the Memory Barrier wikipedia entry, and I got the impression
(out of nowhere) that the hardware took care of that transparently
(i.e., without program intervention), but then, that is not the case,
not allways at least?
A single CPU is allowed to do whatever it wants so long as it can fool
the user into thinking it's executing instructions in a purely
sequential manner. However, the only information other CPUs in the
system have is what they observe on the system bus. Obviously, so long
as CPUs aren't sharing data there aren't any problems. But things get
sticky when this isn't the case. The memory model defines observable
behavior allowed for a given architecture, as well as any methods
offered for affecting that behavior.

Say a specific architecture can operate much more quickly if it is
allowed to perform writes to memory (from cache) in order of ascending
address rather than in the order they were issued in code. There's no
way to lie to other CPUs about the order in which writes occur and still
have the optimization have any effect, so the designers state in the
memory model spec that this architecture is allowed to reorder writes
and then typically provide some means for overriding this behavior
should it prove necessary.

Now let's suppose you have two threads doing something like this:

thread/CPU 1:

A = 1;
B = 2;

thread/CPU 2:

if( A == 1 )
{
assert( B == 2 );
}

Given the order in which a and b were declared, and therefore the order
in which the writes occur, this assert may or may not fail.

Enter memory barriers. Memory barriers are simply a way for the
programmer to tell the CPU "I don't care if your way is faster, A simply
must be written to memory before B or thread 2 will explode." So the
CPU behind thread 1 does as it's told at great expense to performance
and thread 2 doesn't melt down.

The sticky part is that hardware designers don't agree with one another
on how things should work and they never take the advice of the software
people, so all architectures have different sets of observable behavior
and different methods for working around it when necessary. However,
the common concept is that memory barriers all constrain the order in
which memory accesses may occur with respect to each other. Think of it
as an assertion that "X may not occur before Y" or "X may not occur
after Y" at the instruction level.

The x86 is actually a bit weird in this regard as it has no formal
memory barriers for normal operations (though it has the FENCE
instructions for SSE use). I think this is largely for historical
reasons--x86 PCs couldn't do SMP at all until fairly recently so none of
this mattered, and the memory model has always been fairly strict (it
was actually sequential until not terribly long ago). Also, the LOCK
instruction acts as a heavy-handed sort of memory barrier as well, so
there has been little motivation to add new instructions for
finer-grained control.


Sean
Daniel Keep
2006-06-14 10:45:21 UTC
Permalink
Post by Sean Kelly
Post by Bruno Medeiros
Ah, I see. I had recently read about relative out-of-order execution
problems in the Memory Barrier wikipedia entry, and I got the
impression (out of nowhere) that the hardware took care of that
transparently (i.e., without program intervention), but then, that is
not the case, not allways at least?
A single CPU is allowed to do whatever it wants so long as it can fool
the user into thinking it's executing instructions in a purely
sequential manner. However, the only information other CPUs in the
system have is what they observe on the system bus. Obviously, so long
as CPUs aren't sharing data there aren't any problems. But things get
sticky when this isn't the case. The memory model defines observable
behavior allowed for a given architecture, as well as any methods
offered for affecting that behavior.
Say a specific architecture can operate much more quickly if it is
allowed to perform writes to memory (from cache) in order of ascending
address rather than in the order they were issued in code. There's no
way to lie to other CPUs about the order in which writes occur and still
have the optimization have any effect, so the designers state in the
memory model spec that this architecture is allowed to reorder writes
and then typically provide some means for overriding this behavior
should it prove necessary.
A = 1;
B = 2;
if( A == 1 )
{
assert( B == 2 );
}
Given the order in which a and b were declared, and therefore the order
in which the writes occur, this assert may or may not fail.
Enter memory barriers. Memory barriers are simply a way for the
programmer to tell the CPU "I don't care if your way is faster, A simply
must be written to memory before B or thread 2 will explode." So the
CPU behind thread 1 does as it's told at great expense to performance
and thread 2 doesn't melt down.
The sticky part is that hardware designers don't agree with one another
on how things should work and they never take the advice of the software
people, so all architectures have different sets of observable behavior
and different methods for working around it when necessary. However,
the common concept is that memory barriers all constrain the order in
which memory accesses may occur with respect to each other. Think of it
as an assertion that "X may not occur before Y" or "X may not occur
after Y" at the instruction level.
The x86 is actually a bit weird in this regard as it has no formal
memory barriers for normal operations (though it has the FENCE
instructions for SSE use). I think this is largely for historical
reasons--x86 PCs couldn't do SMP at all until fairly recently so none of
this mattered, and the memory model has always been fairly strict (it
was actually sequential until not terribly long ago). Also, the LOCK
instruction acts as a heavy-handed sort of memory barrier as well, so
there has been little motivation to add new instructions for
finer-grained control.
Sean
Cool; learn something new every day. Thanks for the informative post.

-- Daniel
--
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Bruno Medeiros
2006-06-14 17:14:44 UTC
Permalink
Post by Sean Kelly
Post by Bruno Medeiros
Ah, I see. I had recently read about relative out-of-order execution
problems in the Memory Barrier wikipedia entry, and I got the
impression (out of nowhere) that the hardware took care of that
transparently (i.e., without program intervention), but then, that is
not the case, not allways at least?
A single CPU is allowed to do whatever it wants so long as it can fool
the user into thinking it's executing instructions in a purely
sequential manner. However, the only information other CPUs in the
system have is what they observe on the system bus. Obviously, so long
as CPUs aren't sharing data there aren't any problems. But things get
sticky when this isn't the case. The memory model defines observable
behavior allowed for a given architecture, as well as any methods
offered for affecting that behavior.
Say a specific architecture can operate much more quickly if it is
allowed to perform writes to memory (from cache) in order of ascending
address rather than in the order they were issued in code. There's no
way to lie to other CPUs about the order in which writes occur and still
have the optimization have any effect, so the designers state in the
memory model spec that this architecture is allowed to reorder writes
and then typically provide some means for overriding this behavior
should it prove necessary.
A = 1;
B = 2;
if( A == 1 )
{
assert( B == 2 );
}
Given the order in which a and b were declared, and therefore the order
in which the writes occur, this assert may or may not fail.
Enter memory barriers. Memory barriers are simply a way for the
programmer to tell the CPU "I don't care if your way is faster, A simply
must be written to memory before B or thread 2 will explode." So the
CPU behind thread 1 does as it's told at great expense to performance
and thread 2 doesn't melt down.
The sticky part is that hardware designers don't agree with one another
on how things should work and they never take the advice of the software
people, so all architectures have different sets of observable behavior
and different methods for working around it when necessary. However,
the common concept is that memory barriers all constrain the order in
which memory accesses may occur with respect to each other. Think of it
as an assertion that "X may not occur before Y" or "X may not occur
after Y" at the instruction level.
The x86 is actually a bit weird in this regard as it has no formal
memory barriers for normal operations (though it has the FENCE
instructions for SSE use). I think this is largely for historical
reasons--x86 PCs couldn't do SMP at all until fairly recently so none of
this mattered, and the memory model has always been fairly strict (it
was actually sequential until not terribly long ago). Also, the LOCK
instruction acts as a heavy-handed sort of memory barrier as well, so
there has been little motivation to add new instructions for
finer-grained control.
Sean
Nice post!
Makes me think, how does one keep up with this? I mean, one who isn't
(nor wishes to be) a hardware expert, but wants to keep up with the
general developments in this area, thus maintaining an overview of it.
--
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sean Kelly
2006-06-14 19:00:21 UTC
Permalink
Post by Bruno Medeiros
Makes me think, how does one keep up with this? I mean, one who isn't
(nor wishes to be) a hardware expert, but wants to keep up with the
general developments in this area, thus maintaining an overview of it.
comp.programming.threads is worth keeping an eye on, though the jargon
can get a bit thick there at times. The C++ committee is also working
on a new memory model for C++, so any discussion there may be useful.
The easiest way to keep an eye on this is follow the links from Hans
Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you
could keep an eye on comp.std.c++ as well if you're having trouble
staying awake at night. Finally, any paper with Maurice Herlihy's name
on it is a useful resource if you want a deeper understanding of some of
these ideas and don't mind some research. He's got a website with links
to many of his papers, though IIRC some of them are hard to track down
without an ACM membership.


Sean
Lionello Lunesu
2006-06-14 19:38:57 UTC
Permalink
Interesting indeed!

I'm currently reading Boehm's article "Threads Cannot be Implemented as a
Library". I wonder if this means that threads should become primitives of
some sort [in D] ?

L.
Sean Kelly
2006-06-14 22:49:49 UTC
Permalink
Post by Lionello Lunesu
Interesting indeed!
I'm currently reading Boehm's article "Threads Cannot be Implemented as a
Library". I wonder if this means that threads should become primitives of
some sort [in D] ?
It's been a while since I read that article, but I think D already has
enough in-language recognition of threading issues to escape most/all of
the problems Boehm mentions: Thread is a standard library component
rather than a third-party component, synchronized is available for
mutual exclusion, and volatile (plus perhaps inline asm code) is enough
to manage lock-free work. A more robust multithread-aware memory model
would be good to have, but it's a sticky enough issue that I think
Walter is right for waiting on that for now.


Sean
Lionello Lunesu
2006-06-15 11:22:29 UTC
Permalink
Post by Sean Kelly
Post by Lionello Lunesu
Interesting indeed!
I'm currently reading Boehm's article "Threads Cannot be Implemented
as a Library". I wonder if this means that threads should become
primitives of some sort [in D] ?
It's been a while since I read that article, but I think D already has
enough in-language recognition of threading issues to escape most/all of
the problems Boehm mentions: Thread is a standard library component
rather than a third-party component, synchronized is available for
mutual exclusion, and volatile (plus perhaps inline asm code) is enough
to manage lock-free work. A more robust multithread-aware memory model
would be good to have, but it's a sticky enough issue that I think
Walter is right for waiting on that for now.
...that leaves only atomic operations? Or would "volatile" or
"synchronized" take care of memory fencing and such?

Would be nice if there were an built-in type with overloaded ++x, x++,
x=y, all atomic. (I've seen some C++ templates that do this.)

L.
Sean Kelly
2006-06-15 15:45:53 UTC
Permalink
Post by Lionello Lunesu
Post by Sean Kelly
Post by Lionello Lunesu
Interesting indeed!
I'm currently reading Boehm's article "Threads Cannot be Implemented
as a Library". I wonder if this means that threads should become
primitives of some sort [in D] ?
It's been a while since I read that article, but I think D already has
enough in-language recognition of threading issues to escape most/all
of the problems Boehm mentions: Thread is a standard library component
rather than a third-party component, synchronized is available for
mutual exclusion, and volatile (plus perhaps inline asm code) is
enough to manage lock-free work. A more robust multithread-aware
memory model would be good to have, but it's a sticky enough issue
that I think Walter is right for waiting on that for now.
...that leaves only atomic operations? Or would "volatile" or
"synchronized" take care of memory fencing and such?
"volatile" is actually intended to address compiler optimizations in a
similar way to how memory barriers address CPU optimizations. It is a
necessary part of any lock-free operation in D. "synchronized" is used
for mutual exclusion and typically involves a mutex, so while it should
have the proper effect, it's not lock-free.
Post by Lionello Lunesu
Would be nice if there were an built-in type with overloaded ++x, x++,
x=y, all atomic. (I've seen some C++ templates that do this.)
It would be nice, but I've yet to see a proposal that I like. The
problem is that in addition to pure atomic operations (which is how the
x86 typically works by default) lock-free programmers typically want to
make an assertion about instruction ordering as well, and built-in
semantics don't expose this very cleanly. One could simply have
volatile types similar to Java where no optimization is allowed on them
at all, but this may be too heavy-handed to satisfy some folks. For
now, I think a library solution is a reasonable alternative. Ares has
had one for quite a while and it's almost standalone so it could be used
with Phobos with very little effort. The documentation is here (DDoc
seems to want to generate docs for private template functions, so I
apologize for the clutter):

http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html

And the code itself is here:

http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d

It currently only works on the x86 and is really intended for API
developers, but it does the trick and I'll expand it as needed. For
typical use, msync.acq (acquire), msync.rel (release), and msync.none
are the flags you're likely to care about. The other more fine-grained
flags just alias acq/rel on x86 anyway. In case the docs are confusing,
the API calls supported are:

load
store
storeIf (ie. CAS)
increment
decrement


Sean
Lionello Lunesu
2006-06-16 09:55:27 UTC
Permalink
Post by Sean Kelly
Post by Lionello Lunesu
...that leaves only atomic operations? Or would "volatile" or
"synchronized" take care of memory fencing and such?
"volatile" is actually intended to address compiler optimizations in a
similar way to how memory barriers address CPU optimizations. It is a
necessary part of any lock-free operation in D. "synchronized" is used
for mutual exclusion and typically involves a mutex, so while it should
have the proper effect, it's not lock-free.
So I suppose "volatile" could be extended to include memory locking as well!

// instructions inside this block are 'lock'ed
volatile(atomic) {
++a;
a = x;
}

(or perhaps even "synchronized(memory) {...}", but I think extending
volatile makes more sense).

Come to think of it, doesn't "synchronized" also imply "volatile"?

Or, another possibility would be to add the volatile declaration as in
C, but then actually locking all access to the variable:

volatile int i;
i++;// atomic

Seems to me it should be possibly to extend D with a built-in and
portable construct for these atomic operations.
Post by Sean Kelly
It would be nice, but I've yet to see a proposal that I like. The
problem is that in addition to pure atomic operations (which is how the
x86 typically works by default) lock-free programmers typically want to
make an assertion about instruction ordering as well, and built-in
semantics don't expose this very cleanly. One could simply have
volatile types similar to Java where no optimization is allowed on them
at all, but this may be too heavy-handed to satisfy some folks. For
now, I think a library solution is a reasonable alternative. Ares has
had one for quite a while and it's almost standalone so it could be used
with Phobos with very little effort. The documentation is here (DDoc
seems to want to generate docs for private template functions, so I
http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html
http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d
Thanks, I'll have a look at them.

L.
Sean Kelly
2006-06-16 19:06:01 UTC
Permalink
Post by Lionello Lunesu
Post by Sean Kelly
Post by Lionello Lunesu
...that leaves only atomic operations? Or would "volatile" or
"synchronized" take care of memory fencing and such?
"volatile" is actually intended to address compiler optimizations in a
similar way to how memory barriers address CPU optimizations. It is a
necessary part of any lock-free operation in D. "synchronized" is
used for mutual exclusion and typically involves a mutex, so while it
should have the proper effect, it's not lock-free.
So I suppose "volatile" could be extended to include memory locking as well!
Yes it could, though for now I prefer it the way it is as it offers more
control over exactly what's going on. It helps tremendously that D has
such good inline asm support.
Post by Lionello Lunesu
// instructions inside this block are 'lock'ed
volatile(atomic) {
++a;
a = x;
}
(or perhaps even "synchronized(memory) {...}", but I think extending
volatile makes more sense).
Come to think of it, doesn't "synchronized" also imply "volatile"?
Yes. Synchronization mechanisms rely on memory barriers to work
properly. In fact, it's been suggested in the past on c.p.t that an
empty mutex block:

synchronzed {}

could be used as a high-level sort of bidirectional memory barrier,
except for the risk of the compiler optimizing the call away entirely.
Post by Lionello Lunesu
Or, another possibility would be to add the volatile declaration as in
volatile int i;
i++;// atomic
This is sort of how Java implements volatile and it wouldn't surprise me
if C++ did something similar. Right now, the "volatile" qualifier
offers basically no language guarantees for concurrent programming, as
it was actually intended for accessing interrupt data IIRC.
Post by Lionello Lunesu
Seems to me it should be possibly to extend D with a built-in and
portable construct for these atomic operations.
Definately. The trouble is coming up with a good design :-) I think
the C++ team is definately qualified to do so, but they may have to make
some sacrifices in the interest of time.


Sean
Don Clugston
2006-06-16 11:49:02 UTC
Permalink
Post by Sean Kelly
Post by Lionello Lunesu
Post by Sean Kelly
Post by Lionello Lunesu
Interesting indeed!
I'm currently reading Boehm's article "Threads Cannot be Implemented
as a Library". I wonder if this means that threads should become
primitives of some sort [in D] ?
It's been a while since I read that article, but I think D already
has enough in-language recognition of threading issues to escape
most/all of the problems Boehm mentions: Thread is a standard library
component rather than a third-party component, synchronized is
available for mutual exclusion, and volatile (plus perhaps inline asm
code) is enough to manage lock-free work. A more robust
multithread-aware memory model would be good to have, but it's a
sticky enough issue that I think Walter is right for waiting on that
for now.
...that leaves only atomic operations? Or would "volatile" or
"synchronized" take care of memory fencing and such?
"volatile" is actually intended to address compiler optimizations in a
similar way to how memory barriers address CPU optimizations. It is a
necessary part of any lock-free operation in D. "synchronized" is used
for mutual exclusion and typically involves a mutex, so while it should
have the proper effect, it's not lock-free.
Post by Lionello Lunesu
Would be nice if there were an built-in type with overloaded ++x, x++,
x=y, all atomic. (I've seen some C++ templates that do this.)
It would be nice, but I've yet to see a proposal that I like. The
problem is that in addition to pure atomic operations (which is how the
x86 typically works by default) lock-free programmers typically want to
make an assertion about instruction ordering as well, and built-in
semantics don't expose this very cleanly. One could simply have
volatile types similar to Java where no optimization is allowed on them
at all, but this may be too heavy-handed to satisfy some folks. For
now, I think a library solution is a reasonable alternative. Ares has
had one for quite a while and it's almost standalone so it could be used
with Phobos with very little effort.
I'd really like to see this included in the standard distribution.
Has anyone made a general-purpose lock-free linked list? Just a
low-level list of void * pointers would be fantastic.

The documentation is here (DDoc
Post by Sean Kelly
seems to want to generate docs for private template functions, so I
http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html
http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d
It currently only works on the x86 and is really intended for API
developers, but it does the trick and I'll expand it as needed. For
typical use, msync.acq (acquire), msync.rel (release), and msync.none
are the flags you're likely to care about. The other more fine-grained
flags just alias acq/rel on x86 anyway. In case the docs are confusing,
load
store
storeIf (ie. CAS)
increment
decrement
Sean
Sean Kelly
2006-06-16 19:12:31 UTC
Permalink
Post by Don Clugston
I'd really like to see this included in the standard distribution.
Has anyone made a general-purpose lock-free linked list? Just a
low-level list of void * pointers would be fantastic.
Mango contains some of Doug Lea's containers and there may be one there.
If not, it shouldn't be tremendously difficult to implement or port
one. I'm not sure I have the time right now, but if someone wants to
the the atomic package in Ares they're more than welcome to.


Sean
Bruno Medeiros
2006-06-17 21:21:45 UTC
Permalink
Post by Sean Kelly
Post by Bruno Medeiros
Makes me think, how does one keep up with this? I mean, one who isn't
(nor wishes to be) a hardware expert, but wants to keep up with the
general developments in this area, thus maintaining an overview of it.
comp.programming.threads is worth keeping an eye on, though the jargon
can get a bit thick there at times. The C++ committee is also working
on a new memory model for C++, so any discussion there may be useful.
The easiest way to keep an eye on this is follow the links from Hans
Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you
could keep an eye on comp.std.c++ as well if you're having trouble
staying awake at night. Finally, any paper with Maurice Herlihy's name
on it is a useful resource if you want a deeper understanding of some of
these ideas and don't mind some research. He's got a website with links
to many of his papers, though IIRC some of them are hard to track down
without an ACM membership.
Sean
Well, that's the thing, I just want to have some general knowledge about
this area of hardware and concurrency/distributed-systems, not become an
expert on it. My time to learn new things is limited (and I definitely
have no trouble going to sleep :( ), so my priority goes for learning
things that I have immediate need to use/become-involved. If I ever have
the need to learn more in-depth I will.
You see, concurrency is very important for people interested in
server-side programming. But for multimedia programming, it's not that
important. For now that is, as it seems that with the coming of
multicore CPUs, concurrency is becoming more and more of a general
software development topic, relevant even for non-intrinsically
concurrent apps.
--
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sean Kelly
2006-06-17 21:58:36 UTC
Permalink
Post by Bruno Medeiros
Post by Sean Kelly
Post by Bruno Medeiros
Makes me think, how does one keep up with this? I mean, one who isn't
(nor wishes to be) a hardware expert, but wants to keep up with the
general developments in this area, thus maintaining an overview of it.
comp.programming.threads is worth keeping an eye on, though the jargon
can get a bit thick there at times. The C++ committee is also working
on a new memory model for C++, so any discussion there may be useful.
The easiest way to keep an eye on this is follow the links from Hans
Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though
you could keep an eye on comp.std.c++ as well if you're having trouble
staying awake at night. Finally, any paper with Maurice Herlihy's
name on it is a useful resource if you want a deeper understanding of
some of these ideas and don't mind some research. He's got a website
with links to many of his papers, though IIRC some of them are hard to
track down without an ACM membership.
Well, that's the thing, I just want to have some general knowledge about
this area of hardware and concurrency/distributed-systems, not become an
expert on it. My time to learn new things is limited (and I definitely
have no trouble going to sleep :( ), so my priority goes for learning
things that I have immediate need to use/become-involved. If I ever have
the need to learn more in-depth I will.
You see, concurrency is very important for people interested in
server-side programming. But for multimedia programming, it's not that
important. For now that is, as it seems that with the coming of
multicore CPUs, concurrency is becoming more and more of a general
software development topic, relevant even for non-intrinsically
concurrent apps.
Sadly, I don't know of any more basic sources of information on this
stuff. David Butenhof's "Programming with POSIX Threads" is quite good
for the basic concepts, but beyond that it's mostly research papers,
wiki topics, etc. But don't hesitate to ask on comp.programming.threads
or here if you have a question.

Sean

Sean Kelly
2006-06-05 16:17:41 UTC
Permalink
For what it's worth, incremental GC is similar to refcounting in that
the cost is distributed across pointer use to avoid the need for a
costly mark/sweep phase. I've even seen hard-realtime incremental GC
implementations, so it's a long-term solution worth considering. That
said, I would like to be able to create some sort of smart pointer in D
for tracking valuable resources, so it's good to hear that Walter is
considering this as well.
Post by Frank Benoit
Post by Walter Bright
3) in a multithreaded app, the incs and decs must be synchronized
Isn't a atomic inc/dec enough? (Don't know anything about the
performance of the asm "lock" instruction)
Depending on the platform and the synch. requirement, a "lock" type
instruction may be necessary. It typically is for non-x86 platforms,
but I think you could get away without using one in some cases on x86.
The best place to check for optimizations would be the Boost shared_ptr
code. IIRC they don't use the Interlocked calls but there are spin
locks in some places.

The cost of a "lock" instruction is variable based on the hardware,
number of CPUs, whether the cache line is shared, etc, but the most
reliable estimate I've seen averages locked ops at around 70ns.
Post by Frank Benoit
Post by Walter Bright
9) Ref counting does not eliminated latency problems, it just reduces them.
Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete. But it does not need to stop all
other threads. e.g. hardware interrupts don't need to be disabled.
I've seen analyses that suggested reference counting is actually slower
than mark/sweep when measured over the run of the program. The
advantage is the avoidance of the unbounded mark/sweep phase, with far
more expensive pointer modifications instead. As above, I'd almost
rather see incremental GC support in D (it typically requires additional
ode generation on pointer modifications, and may be tricky to sort out
for C routines like memset).


Sean
Jeff Parsons
2006-06-07 08:41:38 UTC
Permalink
Post by Sean Kelly
For what it's worth, incremental GC is similar to refcounting in that
the cost is distributed across pointer use to avoid the need for a
costly mark/sweep phase. I've even seen hard-realtime incremental GC
implementations, so it's a long-term solution worth considering.
My _only_ significant beef with D so far as been the unbound time for a
garbage collection; therefore, if an incremental garbage collector built
into D could guarantee me a set maximum time would be taken for each
garbage collection (which I would manually induce as often as possible)
then unless the efficiency hit is TERRIBLE I would be all over it.

What I'm really curious about, however, is how important this is to
_other_ people - especially Walter. Would it be reasonable to say that
it seems D has been designed largely without long-term real-time
performance in mind? Sure, there are workarounds, but I haven't seen
anything elegant yet that will let me use D for such applications
without making my memory management more complicated than it would have
been in C++. Is anything in the works?
Sean Kelly
2006-06-08 23:04:05 UTC
Permalink
Post by Jeff Parsons
What I'm really curious about, however, is how important this is to
_other_ people - especially Walter. Would it be reasonable to say that
it seems D has been designed largely without long-term real-time
performance in mind? Sure, there are workarounds, but I haven't seen
anything elegant yet that will let me use D for such applications
without making my memory management more complicated than it would have
been in C++. Is anything in the works?
The spec doesn't contain any language forbidding such an approach, but I
don't think the popular compilers will support it by default. Certainly
not any time soon, at any rate. The "problem" with incremental GC is
that it requires extra code generation for all pointer modifications to
signal the garbage collector that some re-scanning may be necessary.
This has obvious performance implications for small apps that don't ever
actually need to collect, but tends to level out as run time increases.
And, of course, the potential for progress guarantees (possible with
incremental GC, but not at all common) is a huge bonus for realtime
programming. That aside, there may be other approaches to garbage
collection that would make everyone happy and that don't require
additional code generation. If you know of one, please say so. My own
knowledge of the topic is limited to what I've read in research papers.


Sean
Larry Evans
2006-06-09 00:50:58 UTC
Permalink
[snip]
Post by Sean Kelly
Post by Jeff Parsons
performance in mind? Sure, there are workarounds, but I haven't seen
anything elegant yet that will let me use D for such applications
without making my memory management more complicated than it would
have been in C++. Is anything in the works?
The spec doesn't contain any language forbidding such an approach, but I
don't think the popular compilers will support it by default. Certainly
[snip]
Post by Sean Kelly
programming. That aside, there may be other approaches to garbage
collection that would make everyone happy and that don't require
additional code generation. If you know of one, please say so. My own
knowledge of the topic is limited to what I've read in research papers.
You could define a policy pointer which could have as one policy, the
current Boehm collector. IOW, any ptr<BW> p(T) on the stack would be a
root pointer and each data structure would have a map of ptr<BW>
locations. Similarly, there could be ptr<Copy> or ptr<IncBw> or even
ptr<deterministic> and ptr<collected> instead of the

class [collected] SomeClass{...};
class [deterministic] OtherClass{...}

proposed by Adrei here:

http://tinyurl.com/o5zpm

This, of course, would require cooperation from the compiler, but
the same it would be true of:

class [collected] SomeClass{...};

it's just that some specializations, ptr<collected>, would be
predetermined and others, for example, user-defined ones,
would not.
Walter Bright
2006-06-09 05:18:24 UTC
Permalink
Post by Jeff Parsons
My _only_ significant beef with D so far as been the unbound time for a
garbage collection; therefore, if an incremental garbage collector built
into D could guarantee me a set maximum time would be taken for each
garbage collection (which I would manually induce as often as possible)
then unless the efficiency hit is TERRIBLE I would be all over it.
If you're looking for a *guarantee*, then you cannot even use malloc().
Malloc has unbounded time, and is not predictable.

The approach most real time programmers use to deal with this is to
preallocate all needed data before entering the real time section, and
the real time section does no malloc's or free's.

Also, you can use malloc and free in D, in exactly the same way you
would in C.
Sean Kelly
2006-06-09 06:12:19 UTC
Permalink
Post by Walter Bright
Post by Jeff Parsons
My _only_ significant beef with D so far as been the unbound time for
a garbage collection; therefore, if an incremental garbage collector
built into D could guarantee me a set maximum time would be taken for
each garbage collection (which I would manually induce as often as
possible) then unless the efficiency hit is TERRIBLE I would be all
over it.
If you're looking for a *guarantee*, then you cannot even use malloc().
Malloc has unbounded time, and is not predictable.
For what it's worth, I have read papers describing hard real-time
incremental garbage collectors, but I haven't yet seen one available
outside research circles. I'd like to give one a shot at some point,
but I've far too many other distractions to know when that will be.

If anyone is interested however, some of the papers are available here:

http://www.cs.utexas.edu/users/oops/papers.html

See papers 11 and 14.
Post by Walter Bright
The approach most real time programmers use to deal with this is to
preallocate all needed data before entering the real time section, and
the real time section does no malloc's or free's.
This is certainly the easiest and probably the safest method.


Sean
Walter Bright
2006-06-09 08:12:59 UTC
Permalink
Post by Sean Kelly
Post by Walter Bright
The approach most real time programmers use to deal with this is to
preallocate all needed data before entering the real time section, and
the real time section does no malloc's or free's.
This is certainly the easiest and probably the safest method.
Yup. It gets the job done.

Real time threads should be considered a very valuable resource, and as
such all possible computation should be removed from them and put into
non-realtime threads.
Derek Parnell
2006-06-09 06:46:03 UTC
Permalink
Post by Walter Bright
The approach most real time programmers use to deal with this is to
preallocate all needed data before entering the real time section, and
the real time section does no malloc's or free's.
Also, you can use malloc and free in D, in exactly the same way you
would in C.
It seems obvious that if one doesn't want the Garbage Collector to collect
garbage, one doesn't create any garbage for it to collect. ;-)
--
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
9/06/2006 4:44:56 PM
Sean Kelly
2006-06-09 06:53:31 UTC
Permalink
Post by Derek Parnell
Post by Walter Bright
The approach most real time programmers use to deal with this is to
preallocate all needed data before entering the real time section, and
the real time section does no malloc's or free's.
Also, you can use malloc and free in D, in exactly the same way you
would in C.
It seems obvious that if one doesn't want the Garbage Collector to collect
garbage, one doesn't create any garbage for it to collect. ;-)
This gets sticky if one wants to use strings in D though, as there's no
real way to make them work outside of the garbage collector mechanism.
Sure we could use C-style pointers, but it's not a terribly attractive
option. I suppose the alternative would be a class with a custom
allocator method.


Sean
Walter Bright
2006-06-09 08:35:17 UTC
Permalink
Post by Sean Kelly
Post by Derek Parnell
Post by Walter Bright
The approach most real time programmers use to deal with this is to
preallocate all needed data before entering the real time section,
and the real time section does no malloc's or free's.
Also, you can use malloc and free in D, in exactly the same way you
would in C.
It seems obvious that if one doesn't want the Garbage Collector to collect
garbage, one doesn't create any garbage for it to collect. ;-)
This gets sticky if one wants to use strings in D though, as there's no
real way to make them work outside of the garbage collector mechanism.
Sure we could use C-style pointers, but it's not a terribly attractive
option.
You can use D arrays. Just allocate them using malloc (or statically
allocate them). Don't concatenate them.

Though I would ask why the real time code was allocating strings?

C is very good for writing real time code, and you can write "C"-style
code in D, and it will behave exactly like the corresponding C code. You
can call printf, strcpy, malloc, etc., which will behave just like C's
printf, strcpy, and malloc because they *are* C's functions. It will not
do a gc collect unless the code allocates gc memory, and it will ONLY do
a gc collect during such an allocation.

Heck, my old Empire game was written in C. I renamed the files from .c
to .d, did a few minor syntactic edits, and voila! it was now in D. It
runs exactly the same.

But the bottom line is that a real time thread should not be doing
storage allocation. It shouldn't be calling new, or malloc. It shouldn't
do storage allocation regardless of whether it is written in D, C, C++,
or assembler.
Sean Kelly
2006-06-09 16:31:59 UTC
Permalink
Post by Walter Bright
Post by Sean Kelly
Post by Derek Parnell
Post by Walter Bright
The approach most real time programmers use to deal with this is to
preallocate all needed data before entering the real time section,
and the real time section does no malloc's or free's.
Also, you can use malloc and free in D, in exactly the same way you
would in C.
It seems obvious that if one doesn't want the Garbage Collector to collect
garbage, one doesn't create any garbage for it to collect. ;-)
This gets sticky if one wants to use strings in D though, as there's
no real way to make them work outside of the garbage collector mechanism.
Sure we could use C-style pointers, but it's not a terribly attractive
option.
You can use D arrays. Just allocate them using malloc (or statically
allocate them). Don't concatenate them.
So then I'd just manually set the ptr and len members? Sounds
reasonable, provided everyone behaves and uses the copy on write idiom
you suggest.
Post by Walter Bright
Though I would ask why the real time code was allocating strings?
I was mostly thinking about this in the context of just how much
allocation pretty much must be performed by the GC for applications with
special memory requirements (gigabytes of data, for example). It just
seemed to have an overlap insofar as this sort of programming is concerned.
Post by Walter Bright
C is very good for writing real time code, and you can write "C"-style
code in D, and it will behave exactly like the corresponding C code. You
can call printf, strcpy, malloc, etc., which will behave just like C's
printf, strcpy, and malloc because they *are* C's functions. It will not
do a gc collect unless the code allocates gc memory, and it will ONLY do
a gc collect during such an allocation.
At times I've considered extending thread support to allow for the
creation of detached threads, but it simply seems to error-prone to be
worthwhile. I think you're right in that if someone is doing real-time
programming then they shouldn't expect much hand-holding, because the
assumptions involved offer too great a risk of unexpected behavior.
Post by Walter Bright
Heck, my old Empire game was written in C. I renamed the files from .c
to .d, did a few minor syntactic edits, and voila! it was now in D. It
runs exactly the same.
Yup. Though it's worth noting that D has been heavily influenced by
your particular programming style. Not at all a bad thing, but it does
imply that you are less likely than anyone else to experience code
translation problems :-) On a related note, I sometimes wonder if Sid
Meier credits you anywhere for effectively creating the genre that's
made him so successful ;-)
Post by Walter Bright
But the bottom line is that a real time thread should not be doing
storage allocation. It shouldn't be calling new, or malloc. It shouldn't
do storage allocation regardless of whether it is written in D, C, C++,
or assembler.
True enough.


Sean
Sean
Walter Bright
2006-06-09 16:42:59 UTC
Permalink
Post by Sean Kelly
Post by Walter Bright
You can use D arrays. Just allocate them using malloc (or statically
allocate them). Don't concatenate them.
So then I'd just manually set the ptr and len members? Sounds
reasonable, provided everyone behaves and uses the copy on write idiom
you suggest.
int *p = cast(int *)calloc(dim, sizeof(int));
int[] a = p[0 .. dim];
Post by Sean Kelly
Post by Walter Bright
C is very good for writing real time code, and you can write "C"-style
code in D, and it will behave exactly like the corresponding C code.
You can call printf, strcpy, malloc, etc., which will behave just like
C's printf, strcpy, and malloc because they *are* C's functions. It
will not do a gc collect unless the code allocates gc memory, and it
will ONLY do a gc collect during such an allocation.
At times I've considered extending thread support to allow for the
creation of detached threads, but it simply seems to error-prone to be
worthwhile. I think you're right in that if someone is doing real-time
programming then they shouldn't expect much hand-holding, because the
assumptions involved offer too great a risk of unexpected behavior.
I agree that realtime programming is a special type of programming, and
that someone doing it should be a more advanced programmer. You can't
just take any random code (in any language), make it realtime, and
expect it to work properly. It's got to be engineered for the
peculiarities of realtime requirements.
Post by Sean Kelly
Post by Walter Bright
Heck, my old Empire game was written in C. I renamed the files from .c
to .d, did a few minor syntactic edits, and voila! it was now in D. It
runs exactly the same.
Yup. Though it's worth noting that D has been heavily influenced by
your particular programming style. Not at all a bad thing, but it does
imply that you are less likely than anyone else to experience code
translation problems :-) On a related note, I sometimes wonder if Sid
Meier credits you anywhere for effectively creating the genre that's
made him so successful ;-)
Nope :-)
David Gileadi
2006-06-09 17:28:36 UTC
Permalink
Post by Walter Bright
Post by Jeff Parsons
My _only_ significant beef with D so far as been the unbound time for
a garbage collection; therefore, if an incremental garbage collector
built into D could guarantee me a set maximum time would be taken for
each garbage collection (which I would manually induce as often as
possible) then unless the efficiency hit is TERRIBLE I would be all
over it.
If you're looking for a *guarantee*, then you cannot even use malloc().
Malloc has unbounded time, and is not predictable.
It seems that a fairly common desire is to use D for writing games.
Games don't typically have hard real-time requirements, but it also
doesn't look good if your screen freezes for a second or two in the
middle of play.

This is an area where an incremental collector or some kind of reference
counting solution would be handy, since (especially in large programs)
it would be difficult to avoid using garbage collected code. I realize
that the programmer has the responsibility to manage memory
intelligently, but it would be nice to not avoid using GC code entirely.
As others have stated, it would even be worth giving up some overall
performance to gain a smooth playing experience.
John Reimer
2006-06-09 18:59:19 UTC
Permalink
Post by David Gileadi
Post by Walter Bright
Post by Jeff Parsons
My _only_ significant beef with D so far as been the unbound time for
a garbage collection; therefore, if an incremental garbage collector
built into D could guarantee me a set maximum time would be taken for
each garbage collection (which I would manually induce as often as
possible) then unless the efficiency hit is TERRIBLE I would be all
over it.
If you're looking for a *guarantee*, then you cannot even use
malloc(). Malloc has unbounded time, and is not predictable.
It seems that a fairly common desire is to use D for writing games.
Games don't typically have hard real-time requirements, but it also
doesn't look good if your screen freezes for a second or two in the
middle of play.
This is an area where an incremental collector or some kind of reference
counting solution would be handy, since (especially in large programs)
it would be difficult to avoid using garbage collected code. I realize
that the programmer has the responsibility to manage memory
intelligently, but it would be nice to not avoid using GC code entirely.
As others have stated, it would even be worth giving up some overall
performance to gain a smooth playing experience.
Didn't Walter point out that that, in the current gc implementation, you
just have to be conscientious about when you call 'new' (which appears
to initiate a collect) if you want to avoid the gc initiated pauses.

So in game programming, I assume it's just a matter of good organization
and planning. You make sure you allocate sufficient memory ahead of
time and avoid using 'new' in any location that would cause an ugly
pause in the game. Furthermore, you might choose appropriate times for
when you want to call a full collect manually.

As I see it, the obligation for careful design doesn't disappear when a
gc is in place. It certainly makes life easier in many ways. But one
still has to be aware of the gc shortcomings and adapt to the situation.
I still see it as a tremendous improvement over having to clean up
after every memory allocation you make (C/C++).

In short, using a gc doesn't mean you don't have to do your homework. :)

-JJR
Dave
2006-06-09 19:40:22 UTC
Permalink
Post by John Reimer
Post by David Gileadi
Post by Walter Bright
Post by Jeff Parsons
My _only_ significant beef with D so far as been the unbound time
for a garbage collection; therefore, if an incremental garbage
collector built into D could guarantee me a set maximum time would
be taken for each garbage collection (which I would manually induce
as often as possible) then unless the efficiency hit is TERRIBLE I
would be all over it.
If you're looking for a *guarantee*, then you cannot even use
malloc(). Malloc has unbounded time, and is not predictable.
It seems that a fairly common desire is to use D for writing games.
Games don't typically have hard real-time requirements, but it also
doesn't look good if your screen freezes for a second or two in the
middle of play.
This is an area where an incremental collector or some kind of
reference counting solution would be handy, since (especially in large
programs) it would be difficult to avoid using garbage collected
code. I realize that the programmer has the responsibility to manage
memory intelligently, but it would be nice to not avoid using GC code
entirely. As others have stated, it would even be worth giving up
some overall performance to gain a smooth playing experience.
Didn't Walter point out that that, in the current gc implementation, you
just have to be conscientious about when you call 'new' (which appears
to initiate a collect) if you want to avoid the gc initiated pauses.
Yea, in the current GC, allocation is the only thing that may implicitly
initiate a collection.
Post by John Reimer
So in game programming, I assume it's just a matter of good organization
and planning. You make sure you allocate sufficient memory ahead of
time and avoid using 'new' in any location that would cause an ugly
pause in the game. Furthermore, you might choose appropriate times for
when you want to call a full collect manually.
And D's easy array slicing along with array operations (like
re-initializing an array with 'arr[] = 0;') makes the code to use
pre-allocated buffers a lot easier to write too.
Post by John Reimer
As I see it, the obligation for careful design doesn't disappear when a
gc is in place. It certainly makes life easier in many ways. But one
still has to be aware of the gc shortcomings and adapt to the situation.
I still see it as a tremendous improvement over having to clean up
after every memory allocation you make (C/C++).
In short, using a gc doesn't mean you don't have to do your homework. :)
-JJR
Chad J
2006-06-12 09:21:22 UTC
Permalink
Post by David Gileadi
It seems that a fairly common desire is to use D for writing games.
Games don't typically have hard real-time requirements, but it also
doesn't look good if your screen freezes for a second or two in the
middle of play.
This is an area where an incremental collector or some kind of reference
counting solution would be handy, since (especially in large programs)
it would be difficult to avoid using garbage collected code. I realize
that the programmer has the responsibility to manage memory
intelligently, but it would be nice to not avoid using GC code entirely.
As others have stated, it would even be worth giving up some overall
performance to gain a smooth playing experience.
This is what I'm screaming.

It is becoming popular nowadays to allow people to modify games in
various ways, one of which is modding the game's code. I don't think
you will see much of this happening inside a 3d engine, but the game
logic that sits next to the 3d engine and dictates gameplay may be open.
This allows people to change things like how the AI behaves, how much
damage you take when hit with a certain weapon, how that certain weapon
behaves, how an economy behaves, etc.

Modders may or may not be experienced programmers, so it's probably best
to assume they don't have significant experience. This has caused
plenty of games to use scripting languages for modding, since C plus
plus is tough to learn and not good for this stuff. I think D could do
it though. It would kick butt if D could do this, because it would make
the mods run much faster than the interpreted code that is probably
going to be used nowadays, even if you have like 50% reference counting
overhead.

Automatic memory management is a huge boon to that sort of ease of code
writing that is expected for inexperienced coders to get in on the
action and do constructive stuff. Even without the modding argument,
being able to use automatic memory management in some parts of the game
would be very advantageous in development time. It seems to me though,
that if you want to do non-pausing games in D, you just use C style
memory management. Bummer.

As David Gileadi said, games don't have strict latency requirements.
For games, garbage collection doesn't need to be deterministic, and it
doesn't need to finish in under 100 microseconds, it just needs to not
be noticable. For a few years I've messed with RunUO which is, afaik,
written entirely in C#, and I never noticed the pauses (though the
worldsaves were annoying!). This is doable.

Also, think of all the new D programmers there would be if someone were
to release a popular game that exposed D as a scripting language :)
renox
2006-06-04 22:01:16 UTC
Permalink
Post by Frank Benoit
Another garbage collector thread :)
The best garbage collector I can think about, is a reference counting
one.
Uh, what's your definition of "best"?
The simplest? Then maybe.

If you take a look at Java, it doesn't use a reference counting GC and
there is a good reason for this: AFAIK modern GC are far better
performance wise than reference counting GC.
One GC that I remember (no I'm not an expert) is Mark Copy GC, see
"MarkCopy:Fast copying GC with less space overhead".

The only thing that reference counting GCs have for them is that this
scheme is simple, more or less robust but "best", best for what?

I doubt very much that the best GC used for say batch computations is
the best for GUI clients..

Regards,
RenoX
Loading...