Discussion:
The choice of threading primitives
(too old to reply)
Marcin 'Qrczak' Kowalczyk
2004-10-03 18:39:03 UTC
Permalink
Hello. I'm implementing threads in my language Kogut and I wonder
which set of threading primitives would provide the best compromise
between elegance, efficiency and expressiveness. I need your help.

Expressiveness itself causes two oppposite tensions. When threading
primitives are more expressive, when they permit more "exotic" things
like throwing exceptions to other threads, suspending threads, cloning
threads, blocking asynchronous exceptions etc. - then on one hand more
programs can be written in a modular way, without having to seed pure
computation with explicit checks related to threading - but on the
other hand it restricts the ways the language can be implemented.

If some feature (e.g. temporarily stopping all other threads) is
generally considered a bad design, then providing it would be worse
than omitting it. If some "too powerful" feature is used to implement
a less powerful one, then another implementation would lack both, even
if the other one could be provided differently. And I want the library
to be elegant. So it requires careful design.

I'm not experienced with multithreaded programming. I'm sure that
years of other people's experience has shown which design is good and
what is bad, and has identified common mistakes. I feel that I have
enough basic knowledge and common sense to tackle it with your help.
I'm asking mostly about the interface design.

For some technical reasons I'm implementing threads in userspace. This
means that I'm not constrained to a particular thread interface the OS
provides. I would not like to make it impossible to implement on top
of some other threading system (e.g. pthreads or .NET) without a
reason, but if some interesting feature is "impossible" for them and
doesn't disturb basic features, I may accept it, possibly as an
optional feature.

Basic assumptions: A semi-functional dynamically typed language,
similar to Dylan or Lisp. When safety and efficiency conflict,
it usually chooses safety. It's usually deterministic; of course
thread scheduling will not be.

Let's start with synchronization primitives. Pthreads provide mutexes
and condition variables, which can be used to build other things.
.NET provides mutexes and monitors. Am I right that a .NET monitor is
equivalent to a mutex together with an associated condition variable?

If so, is it possible to implement condition variables in terms of
monitors?

Why does pthread_cond_wait take the mutex as a parameter, instead of
the condition variable being permanently associated with some mutex
when it's created? Which would be better?

Concurrent Haskell provides MVars. I think they can be implemented
using mutexes and conditions in a straightforward way, and other
constructs might be easier to be efficiently implemented using mutexes
and conditions (i.e. MVars are sometimes too high-level), so I think
I will not give MVars - someone might implement them if he wants.
Good? Or is there some reason to provide them for a language which
does have mutable variables?

Let's suppose that I will have mutexes and condition variables.
I still don't know if it's the best choice, but let's suppose.
I can imagine the following ways to obain a lock, and similarly
for waiting for a condition:

1. Wait indefinitely.

2. Return immediately if it would block, return a boolean which tells
whether it succeeded.

3. Wait no longer than a specified timeout, return the amount of time
remaining or a value which tells that it succeeded (or return
someting else?).

The first is basic of course. Are others essential?

The third one can probably be emulated by starting a timer thread,
which throws an exception to the thread trying to obtain the lock.
But I don't know how to avoid a race condition when the timer stops
the other thread after it obtained the lock but before it has stopped
the timer. Is it possible? Maybe it would require e.g. checking
whether the lock is being held by the current thread. Or how?

What should happen when the same thread tries to obtain a mutex twice?
For example I think that Java allows the locking to pass, and then it
must be unlocked the appropriate number of times. Does this make
sense, or should it just deadlock (detection of deadlocks is another
story), or should it detect this immediately and throw an exception?

Are there any kinds of semaphores which can't be implemented using
mutexes and conditions should be provided as primitives? Or can they
be ignored at all, i.e. someone will implement a semaphore if he will
need it?

Any advices regarding synchronization primitives?

I will proceed with more questions later. Let's not do everything at
once.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
David Hopwood
2004-10-04 08:19:16 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Hello. I'm implementing threads in my language Kogut and I wonder
which set of threading primitives would provide the best compromise
between elegance, efficiency and expressiveness.
I would strongly suggest that you consider using message-passing and/or
event-loop concurrency (see the links for E below) rather than shared-state
concurrency. Recommended reading:

Peter Van Roy and Seif Haridi,
"Concepts, Techniques and Models of Computer Programming"
MIT Press, ISBN 0-262-22069-5, March 2004.
<http://www.info.ucl.ac.be/people/PVR/book.html>

Henry Baker and Carl Hewitt,
"Actors and Continuous Functionals"
MIT-LCS-TR-194, 1978.
<http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-194.pdf>
Also see <http://c2.com/cgi/wiki?ActorsModel>.

<http://c2.com/cgi/wiki?MessagePassingConcurrency>
Post by Marcin 'Qrczak' Kowalczyk
I need your help.
Expressiveness itself causes two oppposite tensions. When threading
primitives are more expressive, when they permit more "exotic" things
like throwing exceptions to other threads,
Asynchronous exceptions are seriously error-prone. You should understand
why async exceptions and thread suspension were deprecated in Java before
designing anything along these lines. Some information about this is at
<http://java.sun.com/j2se/1.3/docs/guide/misc/threadPrimitiveDeprecation.html>,
although it doesn't really express just how problematic Thread.stop(Throwable)
was, or why these primitives were impossible to fix.
Post by Marcin 'Qrczak' Kowalczyk
suspending threads, cloning
threads, blocking asynchronous exceptions etc. - then on one hand more
programs can be written in a modular way, without having to seed pure
computation with explicit checks related to threading - but on the
other hand it restricts the ways the language can be implemented.
If some feature (e.g. temporarily stopping all other threads) is
generally considered a bad design, then providing it would be worse
than omitting it. If some "too powerful" feature is used to implement
a less powerful one, then another implementation would lack both, even
if the other one could be provided differently. And I want the library
to be elegant. So it requires careful design.
It certainly does. Concurrency is one of the two most difficult parts of
language design (the other is type systems, for non-dynamically-typed
languages).
Post by Marcin 'Qrczak' Kowalczyk
Basic assumptions: A semi-functional dynamically typed language,
similar to Dylan or Lisp. When safety and efficiency conflict,
it usually chooses safety. It's usually deterministic; of course
thread scheduling will not be.
You should definitely look at the concurrency designs of:

- Oz <http://www.mozart-oz.org/> and the "Concepts, Techniques and
Models" book.
- E <http://www.erights.org/> and
<http://erights.org/elib/concurrency/event-loop.html>
- Erlang <http://www.erlang.org/> and
<http://http://www.sics.se/~joe/thesis/>.
Post by Marcin 'Qrczak' Kowalczyk
Let's start with synchronization primitives. Pthreads provide mutexes
and condition variables, which can be used to build other things.
.NET provides mutexes and monitors. Am I right that a .NET monitor is
equivalent to a mutex together with an associated condition variable?
If so, is it possible to implement condition variables in terms of
monitors?
All your questions are about low-level primitives. Choose a high-level
approach to concurrency first.
--
David Hopwood <***@blueyonder.co.uk>
Marcin 'Qrczak' Kowalczyk
2004-10-04 11:12:23 UTC
Permalink
Post by David Hopwood
I would strongly suggest that you consider using message-passing
and/or event-loop concurrency (see the links for E below) rather
than shared-state concurrency.
My intuition suggests that they can be implemented in terms of shared
state concurrency, but not conversely. Is this true?

If so, wouldn't it be good to have both? How important are designs
which use shared state concurrency that can't be easily translated
to the other models?

Unfortunately the higher level models seem to be highly dependent on
features of particular languages. For example in Erlang receiving a
message is coupled with pattern matching, and pattern matching in
Erlang can always be done in a bounded time and has no side effects -
this is not true for my language, which doesn't insist on being
*purely* functional.

Also in Erlang all data are immutable, except process handles which
implicitly refer to the local state of the thread. In my language
it would not make sense to restrict messages to be immutable data;
immutable values aren't *formally* distinguished. So one would have
to say what happens when you pass a mutable value, and this rules
out fully transparent distributiveness (not all data can be sensibly
transferred over a wire).

I find it hard to extract the complete essence of these models,
decoupled from a particular language. Can this essence be offered in
terms of types and functions etc., with no special language constructs
needed in their interface (like dataflow variables or pattern matching)?

Papers advocating the high level models tend to not talk about their
disadvantages. There are no panacea, I don't believe that they are
better in all respects. So what are their disadvantages besides being
harder to make fast? What designs they make hard or impossible to
express?
Post by David Hopwood
Asynchronous exceptions are seriously error-prone.
I guess this is because only at some points of time the state of the
thread is consistent; interrupting at an arbitrary point may cause
mutable data structures to be left in bad state. Right?

But without asynchronous exceptions a computation which is to be
interruptible must insert explicit checks whether someone wants it
to stop in all places where it's safe to be stopped, and do it often
enough that it doesn't lag too long before the next safe point.

GHC provides primitives for blocking and unblocking asynchronous
exceptions during execution of the given code. That way a computation
which is safe to be stopped doesn't have to execute the check at every
place, it's enough to declare it at the beginning.

Let's even suppose that asynchronous exceptions are blocked by
default, and can be unblocked on demand. Given that with mostly
functional programming there are many regions where it's safe to abort
at any point, is something wrong with this idea? It seems to be more
expressive than lack of asynchronous exceptions, and safe when used
correctly.
Post by David Hopwood
- Oz <http://www.mozart-oz.org/> and the "Concepts, Techniques and
Models" book.
- E <http://www.erights.org/> and
<http://erights.org/elib/concurrency/event-loop.html>
- Erlang <http://www.erlang.org/> and
<http://http://www.sics.se/~joe/thesis/>.
Thanks. I knew something about Oz and Erlang before, and knew the
book, but I'm not too familiar with these things - too much details to
grasp at once. It's hard to find technical details among the philosophy.
For example the "Event loop" URL says why the event loop approach is
superior to shared state, but doesn't say what this approach actually
*is*. The sources introduce various ideas incrementally, such that I
find it hard to see the complete image.

I'm still very confused.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Joe Seigh
2004-10-04 11:33:13 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by David Hopwood
I would strongly suggest that you consider using message-passing
and/or event-loop concurrency (see the links for E below) rather
than shared-state concurrency.
My intuition suggests that they can be implemented in terms of shared
state concurrency, but not conversely. Is this true?
Probably but that's not the logic employed by message passing advocates.
Their logic goes, "I can do everything I want using message passing, therefore
everybody can". I don't take them too seriously.

The conversely part is problematic. Protocols can become rather nasty and
complicated unless you restrict yourself to more simplistic problems.

Joe Seigh
Marcin 'Qrczak' Kowalczyk
2004-10-04 12:30:52 UTC
Permalink
Post by Joe Seigh
Post by Marcin 'Qrczak' Kowalczyk
My intuition suggests that they can be implemented in terms of
shared state concurrency, but not conversely. Is this true?
Probably but that's not the logic employed by message passing
advocates. Their logic goes, "I can do everything I want using
message passing, therefore everybody can". I don't take them too
seriously.
I'm in a bad position because I'm unable to judge myself whether I can
do everything I want using message passing, because of too little
experience in concurrent programming. That's why I came for help.
I want no flamewars.

A problem with higher level abstractions is that they must be got
right at the first time. If enough low level primitives are provided,
then appropriate abstractions can be built later without changing the
old language and old implementation, only extending it. But if it
provides only safe and convenient yet limited features, and they
happen to be insufficient, we are stuck and must go back and redesign
the core ideas.

Maybe there is a set of high level *and* expressive primitives which
is sufficient, such that nothing which can't be built from them will
ever be needed. But I don't know which it is, and I find hard to
extract them from the references I have. They talk about particular
mechanisms, and show how they are used, but they don't talk about
completeness and what *can't* be expressed with them, and how to work
around the apparent limitations or why they are not a problem in
practice.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Joe Seigh
2004-10-04 12:53:33 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Joe Seigh
Post by Marcin 'Qrczak' Kowalczyk
My intuition suggests that they can be implemented in terms of
shared state concurrency, but not conversely. Is this true?
Probably but that's not the logic employed by message passing
advocates. Their logic goes, "I can do everything I want using
message passing, therefore everybody can". I don't take them too
seriously.
I'm in a bad position because I'm unable to judge myself whether I can
do everything I want using message passing, because of too little
experience in concurrent programming. That's why I came for help.
I want no flamewars.
We don't have flamewars here, just strong opinions. :)
Seriously, what do you want? A lot of uncritical input?
Post by Marcin 'Qrczak' Kowalczyk
A problem with higher level abstractions is that they must be got
right at the first time. If enough low level primitives are provided,
then appropriate abstractions can be built later without changing the
old language and old implementation, only extending it. But if it
provides only safe and convenient yet limited features, and they
happen to be insufficient, we are stuck and must go back and redesign
the core ideas.
So provide all the low level primitives you can think of. I would avoid
providing one high level primitive like condvars "because everything
else can be derived from them" because it sets up an uneven playing
field. The "everything else" automatically becomes more inefficient.
With the low level primatives you can build all the high level api's
you want in the most efficient manner possible.
Post by Marcin 'Qrczak' Kowalczyk
Maybe there is a set of high level *and* expressive primitives which
is sufficient, such that nothing which can't be built from them will
ever be needed. But I don't know which it is, and I find hard to
extract them from the references I have. They talk about particular
mechanisms, and show how they are used, but they don't talk about
completeness and what *can't* be expressed with them, and how to work
around the apparent limitations or why they are not a problem in
practice.
It's more a matter of how efficiently and easily you can express something
using the primatives. For example, implementing condition variables
efficiently and correctly in win32. For example of what happens when
a language deliberately makes it difficult to implement "uncorrect" ideas,
JSR-166. That wasn't a tacked on api.

Joe Seigh
David Hopwood
2004-10-05 01:11:48 UTC
Permalink
Post by Joe Seigh
Post by Marcin 'Qrczak' Kowalczyk
Post by Joe Seigh
Post by Marcin 'Qrczak' Kowalczyk
My intuition suggests that they can be implemented in terms of
shared state concurrency, but not conversely. Is this true?
Probably but that's not the logic employed by message passing
advocates. Their logic goes, "I can do everything I want using
message passing, therefore everybody can". I don't take them too
seriously.
I'm in a bad position because I'm unable to judge myself whether I can
do everything I want using message passing, because of too little
experience in concurrent programming. That's why I came for help.
I want no flamewars.
We don't have flamewars here, just strong opinions. :)
Seriously, what do you want? A lot of uncritical input?
Providing at least some technical argument to support your point might
have been more productive.
Post by Joe Seigh
Post by Marcin 'Qrczak' Kowalczyk
A problem with higher level abstractions is that they must be got
right at the first time. If enough low level primitives are provided,
then appropriate abstractions can be built later without changing the
old language and old implementation, only extending it. But if it
provides only safe and convenient yet limited features, and they
happen to be insufficient, we are stuck and must go back and redesign
the core ideas.
So provide all the low level primitives you can think of.
Fortunately, the OP already understands why that is a bad idea.
--
David Hopwood <***@blueyonder.co.uk>
Alexander Terekhov
2004-10-04 11:55:22 UTC
Permalink
Marcin 'Qrczak' Kowalczyk wrote:
[...]
Post by Marcin 'Qrczak' Kowalczyk
Let's even suppose that asynchronous exceptions are blocked by
default, and can be unblocked on demand. Given that with mostly
functional programming there are many regions where it's safe to abort
at any point, is something wrong with this idea?
Nothing.
Post by Marcin 'Qrczak' Kowalczyk
It seems to be more
expressive than lack of asynchronous exceptions, and safe when used
correctly.
Right. And the language can help to achieve async-cancel
correctness. Stuff similar to const correctness in C/C++, I mean.

regards,
alexander.
Marcin 'Qrczak' Kowalczyk
2004-10-04 12:55:19 UTC
Permalink
Post by Alexander Terekhov
Right. And the language can help to achieve async-cancel
correctness. Stuff similar to const correctness in C/C++, I mean.
Hmm, I'm afraid it's incompatible with dynamic typing.

In GHC async exceptions are unblocked by default I think. Either
behavior can be emulated by the other, by letting the thread which
created the new thread wait for the change of the async blocking
state, so this is the question of convenience, not expressiveness.
Which is better, blocked by default or unblocked by default?
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Alexander Terekhov
2004-10-04 13:07:33 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Alexander Terekhov
Right. And the language can help to achieve async-cancel
correctness. Stuff similar to const correctness in C/C++, I mean.
Hmm, I'm afraid it's incompatible with dynamic typing.
I mean:

http://www.codesourcery.com/archives/c++-pthreads/msg00124.html

regards,
alexander.
Marcin 'Qrczak' Kowalczyk
2004-10-04 14:36:07 UTC
Permalink
Post by Alexander Terekhov
Post by Marcin 'Qrczak' Kowalczyk
Post by Alexander Terekhov
Right. And the language can help to achieve async-cancel
correctness. Stuff similar to const correctness in C/C++, I mean.
Hmm, I'm afraid it's incompatible with dynamic typing.
http://www.codesourcery.com/archives/c++-pthreads/msg00124.html
// only async-cancel-safe operations
// are allowed here. Error/ill-formed
// otherwise. for example:

It is incompatible. What if the code applies a function passed in a
variable?

Is the 'map' function (applies a given function to each element of a
list, return a list of results) async-cancel-safe? In reality it is or
is not, depending on whether its parameter is. With dynamic typing it
makes no sense to distinguish this. Even with static typing it would
have similar problems as Java-style exception specifications, e.g.
doesn't mix well with parametric polymorphism (Java generics).
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Alexander Terekhov
2004-10-04 15:43:17 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Alexander Terekhov
Post by Marcin 'Qrczak' Kowalczyk
Post by Alexander Terekhov
Right. And the language can help to achieve async-cancel
correctness. Stuff similar to const correctness in C/C++, I mean.
Hmm, I'm afraid it's incompatible with dynamic typing.
http://www.codesourcery.com/archives/c++-pthreads/msg00124.html
// only async-cancel-safe operations
// are allowed here. Error/ill-formed
It is incompatible. What if the code applies a function passed in a
variable?
Depends on function type. It must be async-cancel-safe.
Post by Marcin 'Qrczak' Kowalczyk
Is the 'map' function (applies a given function to each element of a
list, return a list of results) async-cancel-safe? In reality it is or
is not, depending on whether its parameter is. ...
Whatever. It's either async-cancel-safe or not. As for generic
programming, you might want to "propagate" async-cancel-
safety.

template<typename T>
void f(T & t) async_cancel_safety(T::f1(), T::f2()) {
t.f1();
sync_cancel { /* ... */ }
t.f2();
}

regards,
alexander.
Jesse Jones
2004-10-04 18:27:28 UTC
Permalink
Post by David Hopwood
Post by Marcin 'Qrczak' Kowalczyk
Hello. I'm implementing threads in my language Kogut and I wonder
which set of threading primitives would provide the best compromise
between elegance, efficiency and expressiveness.
I would strongly suggest that you consider using message-passing and/or
event-loop concurrency (see the links for E below) rather than shared-state
Peter Van Roy and Seif Haridi,
"Concepts, Techniques and Models of Computer Programming"
MIT Press, ISBN 0-262-22069-5, March 2004.
<http://www.info.ucl.ac.be/people/PVR/book.html>
I'd second the recommendation for this book. Very good intro to
different models of compuation and it spends a lot of time on different
models of parallel execution.

I also second the idea of looking at message passing concurrency. I'm
not an expert, but it seems like a much cleaner and simpler model than
shared-state concurrency. In particular take a look at the pi calculus:
it's a formal model for parallel computation in the same way the lambda
calculus is a model for sequential computation. The original papers by
Milner are a good introduction. Also worth looking at are the Pict and
Piccola languages.

If coarser-grained concurrency is adequate you might also want to look
into tuple spaces. They provide a very simple and flexible way for
concurrent processes to exchange information.

-- Jesse
David Hopwood
2004-10-04 08:36:51 UTC
Permalink
Hello. I'm implementing threads in my language Kogut [...]
Incidentally, on looking at <http://kokogut.sourceforge.net/kogut.html>,
the semantics of Kogut seem almost identical to E in most respects --
except that E doesn't have multiple dispatch, has soft typing
<http://c2.com/cgi/wiki?SoftTyping>, and you can't embed C code in it.
E is also a mostly-functional scripting language.
--
David Hopwood <***@blueyonder.co.uk>
Alexander Terekhov
2004-10-04 12:26:15 UTC
Permalink
Marcin 'Qrczak' Kowalczyk wrote:
[...]
Post by Marcin 'Qrczak' Kowalczyk
Let's start with synchronization primitives. Pthreads provide mutexes
and condition variables, which can be used to build other things.
.NET provides mutexes and monitors. Am I right that a .NET monitor is
equivalent to a mutex together with an associated condition variable?
Sort of.
Post by Marcin 'Qrczak' Kowalczyk
If so, is it possible to implement condition variables in terms of
monitors?
Yes.
Post by Marcin 'Qrczak' Kowalczyk
Why does pthread_cond_wait take the mutex as a parameter, instead of
the condition variable being permanently associated with some mutex
when it's created?
Permanent association would result in reinit for reuse with a
different mutex. That's not good. Explicit {re}binding call would
work quite well... but would somewhat complicate the API... that's
why, I guess.

regards,
alexander.
Marcin 'Qrczak' Kowalczyk
2004-10-04 12:51:04 UTC
Permalink
Post by Alexander Terekhov
Post by Marcin 'Qrczak' Kowalczyk
If so, is it possible to implement condition variables in terms of
monitors?
Yes.
As I understand, the raw advantage of condition variables over
monitors is that there can be several conditions associated with a
single mutex, such that only threads waiting for the appropriate
condition will be woken up.

So we should probably introduce a mutable variable which tells which
condition occurred, and a flag for each condition which tells whether
any thread is waiting for it. Signalling a condition would signal a
random waiting thread when any thread is waiting for the right
copndition.

When a thread wakes up and discovers it's not the condition it was
waiting for that has been triggered, it signals it again and goes
to sleep, right? How we can be sure that a thread waiting for the
condition will eventually be woken up? Does blocking have some
fairness properties which guarantee that?

Or is it done in some different way?

If it works, it seems a bit awkward. Is it somehow easier in practice?
Maybe whole designs are translated differently instead of emulating
condition variables? Is it better to have conditions or monitors as
the primitive? Or both?

I'm still open to not basing the primitives on mutexes and conditions
at all, I would be happy if something more elegant yet expressive
could be used instead, but I don't know any "complete" declarative
concurrency model.
Post by Alexander Terekhov
Post by Marcin 'Qrczak' Kowalczyk
Why does pthread_cond_wait take the mutex as a parameter, instead of
the condition variable being permanently associated with some mutex
when it's created?
Permanent association would result in reinit for reuse with a
different mutex.
But why would you want to use it with a different mutex?
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Alexander Terekhov
2004-10-04 13:00:17 UTC
Permalink
Marcin 'Qrczak' Kowalczyk wrote:

[... MS .Net monitors and condvars ...]

http://research.microsoft.com/~birrell/papers/ThreadsCSharp.pdf
Post by Marcin 'Qrczak' Kowalczyk
I'm still open to not basing the primitives on mutexes and conditions
at all, I would be happy if something more elegant yet expressive
Apart from low level stuff a la POSIX, look at ADA's "protected
objects."
Post by Marcin 'Qrczak' Kowalczyk
Post by Alexander Terekhov
Permanent association would result in reinit for reuse with a
different mutex.
But why would you want to use it with a different mutex?
http://google.com/groups?selm=slrn9ehssd.hq.kaz%40cafe.net

regards,
alexander.
Marcin 'Qrczak' Kowalczyk
2004-10-04 17:06:17 UTC
Permalink
Post by Alexander Terekhov
[... MS .Net monitors and condvars ...]
http://research.microsoft.com/~birrell/papers/ThreadsCSharp.pdf
Thanks. This shows how to emulate waiting for separate conditions with
monitors. I find having multiple conditions much cleaner: I can
actually understand it, while I lost patience trying to convince
myself that this emulation works.

Since my language will require explicit synchronization objects anyway
(I will not waste a word per every object just to allow potential
locking on anything), monitors lose the "implicit association"
convenience and in overall conditions seem better.
Post by Alexander Terekhov
Apart from low level stuff a la POSIX, look at ADA's "protected
objects."
*google* *google*

I see. This seems easy to emulate using conditions, and wouldn't fit
my language directly.
Post by Alexander Terekhov
Post by Marcin 'Qrczak' Kowalczyk
But why would you want to use it with a different mutex?
http://google.com/groups?selm=slrn9ehssd.hq.kaz%40cafe.net
This is purely an optimization issue: it allows to reuse a condition
variable instead of creating it afresh. I think it's not that
important, so I would prefer to associate a mutex with a condition
when the condition is created.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Giancarlo Niccolai
2004-10-04 23:18:02 UTC
Permalink
Post by Alexander Terekhov
[... MS .Net monitors and condvars ...]
http://research.microsoft.com/~birrell/papers/ThreadsCSharp.pdf
Post by Marcin 'Qrczak' Kowalczyk
I'm still open to not basing the primitives on mutexes and conditions
at all, I would be happy if something more elegant yet expressive
Apart from low level stuff a la POSIX, look at ADA's "protected
objects."
The "post-java" threading model of MS automatizes some work the pthread user
has to do by hand, but it's quite an equivalent model; you may do it with a
few very well organized defines. As such, it is probably a model that
induces some efficiency defect, if not directly in the threading management
as a latent overhead in the whole VM, but stability and easy-to-useness is
the primary goal of a scripting language (including C#), so in general the
trade-off is worth. You can't forget to unlock a mutex or to register a
cleanup condition in case of cancelation, because this safety is
implemented in the grammar rater than left to the programmer.

Anyhow, this comes to a cost: I use to say that "there's no shortcut to
correct threading". There's a very good reason why non-recursive mutexes
have been the only "official" ones in the posix model for a long time, and
is not a matter of efficiency.

I won't descend into details here, but to do real high performance
threading, or simply correct threading on very complex project, this
implementation of the primitives is a little clumsy. Anyhow, they are great
to do the programs you may want to do with C#, so I suppose they are
perfectly shaped for their task.

Giancarlo.
Giancarlo Niccolai
2004-10-04 16:14:43 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Alexander Terekhov
Post by Marcin 'Qrczak' Kowalczyk
If so, is it possible to implement condition variables in terms of
monitors?
Yes.
As I understand, the raw advantage of condition variables over
monitors is that there can be several conditions associated with a
single mutex, such that only threads waiting for the appropriate
condition will be woken up.
Not just that. Conditions variables are also more flexible, allowing to test
for a predicate (condition) rather than testing generically for "an object
to be free for mangling". With a cond, you may i.e. sleep until a task is
done 100 times; with a monitor guarding an object, you should enter it 100
times and check an internal mangling counter to achive a similar result. In
threading, object abstraction is not important as task abstraction.
Post by Marcin 'Qrczak' Kowalczyk
So we should probably introduce a mutable variable which tells which
condition occurred, and a flag for each condition which tells whether
any thread is waiting for it. Signalling a condition would signal a
random waiting thread when any thread is waiting for the right
copndition.
Sometimes. Other times, just signaling a condition is enough (i.e. when the
likelyhood that there is at least a thread waiting is very high).
Post by Marcin 'Qrczak' Kowalczyk
When a thread wakes up and discovers it's not the condition it was
waiting for that has been triggered, it signals it again and goes
to sleep, right? How we can be sure that a thread waiting for the
condition will eventually be woken up? Does blocking have some
fairness properties which guarantee that?
No: it is said in official docs that a condition must be broadcasted (that
is, ALL the waiting thread MUST be woken up) UNLESS it can be demonstrated
that it is necessary just for one thread to be waken up. This is the case
of the semaphores: when rising the count from -1 to 0, one and only one of
the waiting thread (if any) can gain access to the semaphore without being
re-blocked, so you don't broadcast, just signal.

Otherwise, you have to broadcast, and every waiting thread has the right to
proceed if the predicate is verified.
Post by Marcin 'Qrczak' Kowalczyk
Or is it done in some different way?
Well written POSIX implementation should do avoid to wake a thread that is
not going to be able to gain the associated mutex immediately. Yet, is a
requirement that all the sleeping threads waiting on a condition are woken
up on broadcast; a well written algo should make them "wakeable" and do the
real wake as soon as the mutex is found free, but this must be done
carefully (may bring problems if not well done) and this optimization is
not a requirement.
Post by Marcin 'Qrczak' Kowalczyk
If it works, it seems a bit awkward. Is it somehow easier in practice?
Maybe whole designs are translated differently instead of emulating
condition variables? Is it better to have conditions or monitors as
the primitive? Or both?
Conditions; monitors are not primitives, they are built on primitives. It
"seems awkward", but it's the only way to do things safe, even in a object
oriented environment. In example, a monitor just puts you at sleep until
you are the only owner of the monitor; Immagine an object with 100 of
members that interests, 10 by 10, different threads. While they may run
concurrently without a halt, a monitor approach would block them all.
Also, consider read-only access patterns. Finally, consider SIGNALING. That
is, a thread may consider an object, and then decide that is NOT worth to
alter it; a signal may be issued only if there is a thread really waiting
for a certain aspect of the object being changed in a certain way; this
cannot be expressed with the monitor concept. I.e. you may have a thread
alive, but sleeping for hours until a certain count field in the object
reaches 0, and then immediately taking control. With a monitor apporach,
you have to periodically check for the count to have reached a certain
value by repeatedly entering the monitor (and so blocking the whole
object/doing useles cx switches).

Also, as the condition wait is a cancelation point, you know that a thread
waiting for a condition is ready to be canceled. What about a thread
willing to gain a monitor? There's no guarantee that a thread that MUST
access an object has its internal status coherent; you may even have nested
monitor accesses (calling an object method from another object's method),
and in this case the "readyness" of a thread to be canceled while waiting
for an object to become enterable is, at best, context-depending (that is
to say, "questionable").
Post by Marcin 'Qrczak' Kowalczyk
I'm still open to not basing the primitives on mutexes and conditions
at all, I would be happy if something more elegant yet expressive
could be used instead, but I don't know any "complete" declarative
concurrency model.
See http://wefts.sf.net. Anyhow, I am sorry but everything more
"expressive" than mutex and conditions reduces your ability to express
concepts. You can't build a doll house if you have 1toon brick; you can't
do serious threading if you have just monitors.

There is a theoretical reason why mutex and events (and expecially "guarded
events" as the conditions are) are necessary and sufficient to solve all
the synchronization problems of multithreading; it depends on the
complexity levels rising from agents coordination. I'll post an article
soon.
Post by Marcin 'Qrczak' Kowalczyk
Post by Alexander Terekhov
Post by Marcin 'Qrczak' Kowalczyk
Why does pthread_cond_wait take the mutex as a parameter, instead of
the condition variable being permanently associated with some mutex
when it's created?
Permanent association would result in reinit for reuse with a
different mutex.
But why would you want to use it with a different mutex?
You may want to use the same mutex with a different condition. Immagine you
have two guardian threads for a value; the value is the same, but one
thread must take control if the item value falls below 0, while the other
fires if the item value rises above 10. Same mutex (same data to be
guarded) different condition variable (different predicate to be checked).

Got the point? Conditions guard predicates, mutexes guard data.

Giancarlo Niccolai
Marcin 'Qrczak' Kowalczyk
2004-10-04 17:56:25 UTC
Permalink
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
As I understand, the raw advantage of condition variables over
monitors is that there can be several conditions associated with a
single mutex, such that only threads waiting for the appropriate
condition will be woken up.
Not just that. Conditions variables are also more flexible, allowing
to test for a predicate (condition) rather than testing generically
for "an object to be free for mangling". With a cond, you may i.e.
sleep until a task is done 100 times; with a monitor guarding an
object, you should enter it 100 times and check an internal mangling
counter to achive a similar result.
I don't understand. If the only condition we are interested in is
"the task has been done 100 times", then a monitor would be equally
good, wouldn't it? Whoever does the task, would check if this is the
100th time, and signal/notify/ /pulse only then.
Post by Giancarlo Niccolai
Well written POSIX implementation should do avoid to wake a thread
that is not going to be able to gain the associated mutex immediately.
Well, it can never gain it immediately, because it's being held by the
thread which called signal (unless we permit signalling without
holding the mutex - should we?). As I understand it, signal just moves
a waiting thread from the queue waiting for signal to the queue
waiting to obtain the mutex - it will not be woken up immediately,
but when the mutex is unlocked, right?
Post by Giancarlo Niccolai
There is a theoretical reason why mutex and events (and expecially
"guarded events" as the conditions are) are necessary and sufficient
to solve all the synchronization problems of multithreading; it
depends on the complexity levels rising from agents coordination.
I'll post an article soon.
Interesting. But mutexes and conditions with which operations?
These are the basic ones (besides creation of these objects and
forking threads):

Lock m
Unlock m
Wait c (or: Wait c m, if the mutex is not being associated permanently)
Signal c
SignalAll c

but there are also others - are they essential as well, or they are
better emulated or worked around?:

TimedLock m timeout
TimedWait c timeout
TryLock m (special case of TimedLock with 0 timeout)

The paper "Threads yield continuations"
<http://www.cs.indiana.edu/~dyb/papers/subK.ps>,
which says how to implement Scheme-like continuations or even
"subcontinuations" in terms of threads, adds different operations:

Unlock' m thread (passes it to this thread)
Wait' c thread (passes the mutex to this thread)
Suspend thread
Release thread

and also cloning a thread for multishot continuations. BTW, this paper
also associates a mutex with a condition permanently instead of
specifying it for a Wait call.

Then there are asynchronous exceptions in various flavors (either only
explicit cancellation points, or cancellation ranges of computation,
or no protection). The cancellation ranges version seems to be the
most expressive.

And I don't know what else.

I don't know which subset of this is essential.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
But why would you want to use it with a different mutex?
You may want to use the same mutex with a different condition.
Yes, but I was talking about using the same condition with a different
mutex! For now I conclude that this ability is not essential, it's
very easily emulated by using separate conditions, so a simpler to use
interface which permanently associates a mutex with a condition might
be better.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Giancarlo Niccolai
2004-10-04 21:20:43 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
As I understand, the raw advantage of condition variables over
monitors is that there can be several conditions associated with a
single mutex, such that only threads waiting for the appropriate
condition will be woken up.
Not just that. Conditions variables are also more flexible, allowing
to test for a predicate (condition) rather than testing generically
for "an object to be free for mangling". With a cond, you may i.e.
sleep until a task is done 100 times; with a monitor guarding an
object, you should enter it 100 times and check an internal mangling
counter to achive a similar result.
I don't understand. If the only condition we are interested in is
"the task has been done 100 times", then a monitor would be equally
good, wouldn't it? Whoever does the task, would check if this is the
100th time, and signal/notify/ /pulse only then.
Yes, but the waiting thread would be totally idle (that is, consuming
exactly 0 cpu) up to the moment in which that pulse is issued (with some
minor exceptions that are not worth discussing here). In a monitor model,
the thread willing to know if a certain count has been hit should re-enter
the monitor and check the count to be reached with a given periodicity.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Well written POSIX implementation should do avoid to wake a thread
that is not going to be able to gain the associated mutex immediately.
Well, it can never gain it immediately, because it's being held by the
thread which called signal (unless we permit signalling without
holding the mutex - should we?).
Yes, it is both possible and legal.
Post by Marcin 'Qrczak' Kowalczyk
As I understand it, signal just moves
a waiting thread from the queue waiting for signal to the queue
waiting to obtain the mutex - it will not be woken up immediately,
but when the mutex is unlocked, right?
It depends from the implementation, the number of real processors and so on.
IN the theoretical model, all the threads are immediately woken and begin
immediately to contend for the mutex; implementations are free to
"optimize" this behavior provided the effect is the same.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
There is a theoretical reason why mutex and events (and expecially
"guarded events" as the conditions are) are necessary and sufficient
to solve all the synchronization problems of multithreading; it
depends on the complexity levels rising from agents coordination.
I'll post an article soon.
Interesting. But mutexes and conditions with which operations?
These are the basic ones (besides creation of these objects and
Lock m
Unlock m
Wait c (or: Wait c m, if the mutex is not being associated
permanently) Signal c
SignalAll c
but there are also others - are they essential as well, or they are
TimedLock m timeout
TimedWait c timeout
TryLock m (special case of TimedLock with 0 timeout)
Timed lock, timed wait and trylock are so called "semantics", that is, ways
in which you USE the primitives. The condiction is the primitive, broadcast
and signaling it are semantics.
Post by Marcin 'Qrczak' Kowalczyk
The paper "Threads yield continuations"
<http://www.cs.indiana.edu/~dyb/papers/subK.ps>,
which says how to implement Scheme-like continuations or even
Unlock' m thread (passes it to this thread)
Wait' c thread (passes the mutex to this thread)
Suspend thread
Release thread
and also cloning a thread for multishot continuations. BTW, this paper
also associates a mutex with a condition permanently instead of
specifying it for a Wait call.
You should not believe in everything you hear ;-). You cannot suspend
another thread as its internal status is undetermined outside itself. Don't
get me wrong: I am not pretentious nor egocentric, the discussion about how
bad a threading model can be designed in even famous languages (and APIs,
including Microsoft Process SDK), has already been undertaken in this
group, I am just reporting the result.

In a few words, you can't control a thread control flow from another thread.
Take this example: you are driving a car and there's a blind person beside
you that is telling you what to do as he is wired up with sensors. But the
sensors are 1/10th second late, he has human reaction times and you have
human reaction times when he says what you should do. When he tells you
something, you usually act too late.

That's the thread situation; there's no reliable way to communicate internal
thread status unless a mutex is involved; and if you suspend a thread
without knowing why it is running, you may find later on that he was
carrying out a vital task.
Post by Marcin 'Qrczak' Kowalczyk
Yes, but I was talking about using the same condition with a different
mutex! For now I conclude that this ability is not essential, it's
very easily emulated by using separate conditions, so a simpler to use
interface which permanently associates a mutex with a condition might
be better.
You should never use a "condition with a different mutex", under normal
operational behavior; you may do it if, for any reason, a condition
survives while a mutex must be recreated in your program logic (usually you
should not), or if the usefulness of a mutex ceases and another one becomes
more useful, but this in a DEFINITIVE fashon; you just can't mix different
mutexes with a single condition wait or signaling. In Wefts++ there's a
class that binds a mutex with a condition, if you are meaning that, but it
does not "forces" to use only the mutex-condition bound version; is just an
utility for the common case.

bests.

Giancarlo.
Marcin 'Qrczak' Kowalczyk
2004-10-04 22:14:48 UTC
Permalink
Post by Giancarlo Niccolai
Timed lock, timed wait and trylock are so called "semantics", that
is, ways in which you USE the primitives. The condiction is the
primitive, broadcast and signaling it are semantics.
A type with no operations is useless (except as a placeholder).
Operations are as important as types.

And I really don't know whether I should complicate the implementation
with timed locking or leave this to implementation using separate threads
(if it's possible at all).

What bothers me with specifyinf a timeout is that it's just one of
possible ways we might want to limit waiting. I could as well say
"wait for this lock, but not infinitely - only until someone signals
this condition". But how to implement that? It seems to need
cancellation (only during waiting) and something to protect against
being cancelled just after we obtain the lock, before we stop the
thread which is going to cancel us.

TryLock is easy when we have timed lock, but it seems impossible with
just plain locking. Is it something which is needed, or the design can
be generally worked around?
Post by Giancarlo Niccolai
the discussion about how bad a threading model can be designed in
even famous languages (and APIs, including Microsoft Process SDK),
has already been undertaken in this group, I am just reporting the
result.
Great! I came for there results :-)
Post by Giancarlo Niccolai
In a few words, you can't control a thread control flow from another thread.
Ok, no stopping/resuming, and I will leave the fate of asynchronous
exceptions until later considerations. But some cancellation is
necessary to be able to write a computation which waits for the result
of one of two blocking sources, whichever comes first. .NET provides
some Interrupt method, which either cancels ongoing waiting or
registers to cancel the next future waiting if the process is not
waiting at this moment. Is this a good idea? I still don't know how to
use it for this case.

It seems impossible to pick only one resource, there will always be
cases possible when we get both, right? And if this happens, I don't
know how to implement cancellation properly in order to at least
detect such case reliably.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Giancarlo Niccolai
2004-10-04 22:59:00 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Timed lock, timed wait and trylock are so called "semantics", that
is, ways in which you USE the primitives. The contheoretically
primitive, broadcast and signaling it are semantics.
A type with no operations is useless (except as a placeholder).
Operations are as important as types.
Of course. It has the same importance as avoiding confusing a type
(condition variable) with its operations (semantics).
Post by Marcin 'Qrczak' Kowalczyk
And I really don't know whether I should complicate the implementation
with timed locking or leave this to implementation using separate threads
(if it's possible at all).
Many ppl here thinks that timed locking is a brain-damaged technique. I am
not one of them, but I can definitely live without it. There are pros' and
cons'; the biggest cons' is that the status of a thread is the most
undefined when waiting for a mutex. What should happen then if a thread is
canceled while waiting for a mutex? There's no definitive answer to that.
And what would be the good of waiting i.e. 10 seconds to achieve a lock?
Mutex are MEANT to be locked for a small time as a nanosecond, more means a
broken logic. (well I am rude here but expressing an extreme position, not
exactly a "correct" one).

I implement a "special mutex" that can be trylocked, but it's meant for
slower operation, and is just a wrapping around a condition wait that
APPEARES to be a mutex on the outside. Still, I fail to see it's absolute
necessity (again, I implement it as an utility more than a real primitive
semantic).
Post by Marcin 'Qrczak' Kowalczyk
What bothers me with specifyinf a timeout is that it's just one of
possible ways we might want to limit waiting. I could as well say
"wait for this lock, but not infinitely - only until someone signals
this condition". But how to implement that? It seems to need
cancellation (only during waiting) and something to protect against
being cancelled just after we obtain the lock, before we stop the
thread which is going to cancel us.
See wefts++ XMutex class. It's a condition/mutex loop that externally
appears as a mutex; it's status is always defined: it can't be canceled
when the lock has been achieved, and if it's canceled, the lock is surely
not held. Anyhow, despise it's quite easy to remove the uncertainness in
the wait/hold mutex sequence, I repeat, it's necessity has to be definitely
demonstrated.
Post by Marcin 'Qrczak' Kowalczyk
TryLock is easy when we have timed lock, but it seems impossible with
just plain locking. Is it something which is needed, or the design can
be generally worked around?
Useful in some situation, but rarely vital. I would say, quite never. If not
present in the low level system (now pthread and win SDK has it), it can be
easily implemented at assembly level via a compare-and-swap operation. It's
useful if i.e. you may want to fill your timeslice with something useful if
you had not achieved the lock immediately, and you may have something that
does not requires locking to do... but if you have, why do the lock in the
first place? The utility of trylocking has been questioned in this group
many times, and a definitive answer has not been given. I would say "it's
cheap, let it in", yet I am not able to determine how a program may be
really improved in a major way with trylocking.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
In a few words, you can't control a thread control flow from another thread.
Ok, no stopping/resuming, and I will leave the fate of asynchronous
exceptions until later considerations. But some cancellation is
necessary to be able to write a computation which waits for the result
of one of two blocking sources, whichever comes first. .NET provides
some Interrupt method, which either cancels ongoing waiting or
registers to cancel the next future waiting if the process is not
waiting at this moment. Is this a good idea? I still don't know how to
use it for this case.
Yes: "deferred" cancellation is the way to go. A deferred cancellation is a
cancellation that is performed as soon as the cancelled thread is in a
state that is known to be perfectly cancellable (at least theoretically).
Posix individuates what are said to be "cancellation points" that are spots
at which the thread must arrive "ready to be cancelled". Generically, they
are all the operations that may put the thread in wait (this both respects
the theoretical model and give implementors a nice hook where to put
cancellation tests). Win SDK had not such an ability, and I supposed they
tried to avoid this mistake in .NET (copying the posix model).

There are two types of deferred cancellations: kind and unkind. A kind
request must be accepted by the cancelled thread to be effective; that is,
I may present myself to a cancellation point knowing I am NOT ready to be
cancelled. Instead of making myself ready, I may preventively deny
cancellation requests, so that if a request is issued during wait, or if it
was already pending, it is NOT fulfilled even if I am put in wait. This is
dangerous and not elegant, but sometime useful (i.e. it may be used in a
timed wait from which I know I am gonna wake at a given time, but it should
not be used for waits that may last a random or infinite time as network
reads). The target thread may and should be even more cooperative by
willfully test for cancellation requests and fulfilling them in spots where
its status is clean (is what posix pthread_testcancel() does).

All this methods puts the responsibility of cancellation in charge of the
cancelled thread, which is the only one that can exactly know what its
current status is. If you think about it it's quite natural; immagine a
multiprocessor environment: how can another thread running in another
processor get a reliable snapshot of the register and cache memory of the
first processor IN TIME TO DO ANYTHING USEFUL? Only the thread that is
running on the first processor can know that its registers and cache memory
are exactly the ones that he sees.
Post by Marcin 'Qrczak' Kowalczyk
It seems impossible to pick only one resource, there will always be
cases possible when we get both, right? And if this happens, I don't
know how to implement cancellation properly in order to at least
detect such case reliably.
Again, Have a look at Wefts++ lib. It's not a bible of knowledge, I am
sorry, but its doc explains in detail why some implementing choice has been
done, and it has undergone some review by some member of this group, so
even if it may not be "perfect", it is at least "right".

Giancarlo.
Jonathan Adams
2004-10-05 01:43:13 UTC
Permalink
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
TryLock is easy when we have timed lock, but it seems impossible with
just plain locking. Is it something which is needed, or the design can
be generally worked around?
Useful in some situation, but rarely vital. I would say, quite never. If not
present in the low level system (now pthread and win SDK has it), it can be
easily implemented at assembly level via a compare-and-swap operation. It's
useful if i.e. you may want to fill your timeslice with something useful if
you had not achieved the lock immediately, and you may have something that
does not requires locking to do... but if you have, why do the lock in the
first place? The utility of trylocking has been questioned in this group
many times, and a definitive answer has not been given. I would say "it's
cheap, let it in", yet I am not able to determine how a program may be
really improved in a major way with trylocking.
I've seen it used to measure contention on a lock:

if (pthread_mutex_trylock(&s->mtx) != 0) {
(void) pthread_mutex_lock(&x->mtx);
x->contention++;
}

this can allow an adaptive response to contention levels. I've also seen
it used for a fast-path for out-of-order acquires:

if (pthread_mutex_trylock(&mtx1) != 0) {
(void) pthread_mutex_unlock(&mtx2);
(void) pthread_mutex_lock(&mtx1);
(void) pthread_mutex_lock(&mtx2);
/* consistency check here, if needed */
}

Cheers,
- jonathan
Giancarlo Niccolai
2004-10-05 12:41:19 UTC
Permalink
Post by Jonathan Adams
if (pthread_mutex_trylock(&s->mtx) != 0) {
(void) pthread_mutex_lock(&x->mtx);
x->contention++;
}
this can allow an adaptive response to contention levels.
Oh, THIS is cool.
Post by Jonathan Adams
I've also seen
if (pthread_mutex_trylock(&mtx1) != 0) {
(void) pthread_mutex_unlock(&mtx2);
(void) pthread_mutex_lock(&mtx1);
(void) pthread_mutex_lock(&mtx2);
/* consistency check here, if needed */
}
Cheers,
- jonathan
Uhm, I like this a little less; just because I don't like cross-locking
(that is, lock two mutexes at the same time). But if well handled, it may
be fine.

Giancarlo.
Joe Seigh
2004-10-05 13:08:42 UTC
Permalink
Post by Jonathan Adams
if (pthread_mutex_trylock(&s->mtx) != 0) {
(void) pthread_mutex_lock(&x->mtx);
x->contention++;
}
I've tried that. It didn't seem to give the expected results.
Either that or contention doesn't produce the behavior I thought
it would. I ended up ripping the code out as being a producer of
meaningless performance statistics.

Joe Seigh
Jonathan Adams
2004-10-05 15:10:19 UTC
Permalink
Post by Joe Seigh
Post by Jonathan Adams
if (pthread_mutex_trylock(&s->mtx) != 0) {
(void) pthread_mutex_lock(&x->mtx);
x->contention++;
}
I've tried that. It didn't seem to give the expected results.
Either that or contention doesn't produce the behavior I thought
it would. I ended up ripping the code out as being a producer of
meaningless performance statistics.
Hrm. It's hard to say what happened -- The details of what trylock does
are implementation-dependent. (which may be a drawback of this approach)

The main example I'm familiar with is covered in
http://www.usenix.org/event/usenix01/bonwick.html

the following paragraph is the relevant one:

Rather than picking some "magic value," we designed the magazine
layer to tune itself dynamically. We start each object cache with a
small value of M and observe the contention rate on the depot lock.
We do this by using a non-blocking trylock primitive on the depot
lock; if that fails we use the ordinary blocking lock primitive and
increment a contention count. If the contention rate exceeds a
fixed threshold we increase the cache's magazine size. We enforce a
maximum magazine size to ensure that this feedback loop can't get
out of control, but in practice the algorithm behaves extremely well
on everything from desktops to 64-CPU Starfires. The algorithm
generally stabilizes after several minutes of load with reasonable
magazine sizes and depot lock contention rates of less than once per
second.

Looking at the statistics on a handy busy NFS server, the contention
rates (as measured by the above) are < 1/1000. For example, the cache
with the busiest "depot" had 13,253,331 depot interactions, leading to
3,259 contention events, which is 0.025%. The server's been up for four
days (we upgrade it bi-weekly to the latest development release).

The threshold for re-sizing at the moment is 3 contention events in 15
seconds.

- jonathan
Joe Seigh
2004-10-05 15:25:01 UTC
Permalink
Post by Jonathan Adams
Post by Joe Seigh
I've tried that. It didn't seem to give the expected results.
Either that or contention doesn't produce the behavior I thought
it would. I ended up ripping the code out as being a producer of
meaningless performance statistics.
Hrm. It's hard to say what happened -- The details of what trylock does
are implementation-dependent. (which may be a drawback of this approach)
The main example I'm familiar with is covered in
http://www.usenix.org/event/usenix01/bonwick.html
Rather than picking some "magic value," we designed the magazine
layer to tune itself dynamically. We start each object cache with a
small value of M and observe the contention rate on the depot lock.
We do this by using a non-blocking trylock primitive on the depot
lock; if that fails we use the ordinary blocking lock primitive and
increment a contention count. If the contention rate exceeds a
fixed threshold we increase the cache's magazine size. We enforce a
maximum magazine size to ensure that this feedback loop can't get
out of control, but in practice the algorithm behaves extremely well
on everything from desktops to 64-CPU Starfires. The algorithm
generally stabilizes after several minutes of load with reasonable
magazine sizes and depot lock contention rates of less than once per
second.
Looking at the statistics on a handy busy NFS server, the contention
rates (as measured by the above) are < 1/1000. For example, the cache
with the busiest "depot" had 13,253,331 depot interactions, leading to
3,259 contention events, which is 0.025%. The server's been up for four
days (we upgrade it bi-weekly to the latest development release).
The threshold for re-sizing at the moment is 3 contention events in 15
seconds.
I was trying to establish correlation between several dozen or so variables
and overall performance, looking especially for variables that were early
predicters of performance degradation. There didn't seem to be any correlation
between "contention" on the locks I was measuring and performance. So either
I was seeing some sort of side effect or contention doesn't necessarily effect
performance the way we think it does.


Joe Seigh
Jonathan Adams
2004-10-08 09:55:00 UTC
Permalink
Post by Joe Seigh
Post by Jonathan Adams
Post by Joe Seigh
I've tried that. It didn't seem to give the expected results.
Either that or contention doesn't produce the behavior I thought
it would. I ended up ripping the code out as being a producer of
meaningless performance statistics.
Hrm. It's hard to say what happened -- The details of what trylock does
are implementation-dependent. (which may be a drawback of this approach)
...
I was trying to establish correlation between several dozen or so variables
and overall performance, looking especially for variables that were early
predicters of performance degradation. There didn't seem to be any correlation
between "contention" on the locks I was measuring and performance. So either
I was seeing some sort of side effect or contention doesn't necessarily effect
performance the way we think it does.
What threading library was this? What OS?

Certainly, detailed contention statistics (how many spins / how long
blocked) are much better performance indicators than a boolean "did I
block?"

- jonathan
Marcin 'Qrczak' Kowalczyk
2004-10-05 11:10:32 UTC
Permalink
Post by Giancarlo Niccolai
Mutex are MEANT to be locked for a small time as a nanosecond, more
means a broken logic.
This argument doesn't apply to timed wait for a condition.
Post by Giancarlo Niccolai
I implement a "special mutex" that can be trylocked, but it's meant
for slower operation, and is just a wrapping around a condition wait
that APPEARES to be a mutex on the outside.
Ok, this is enough for me to leave it out.
Post by Giancarlo Niccolai
The utility of trylocking has been questioned in this group many
times, and a definitive answer has not been given. I would say "it's
cheap, let it in", yet I am not able to determine how a program may
be really improved in a major way with trylocking.
Fortunately providing or not providing it doesn't have impact on other
things, e.g. it doesn't require changing implementation of a mutex,
so it doesn't matter much.
Post by Giancarlo Niccolai
There are two types of deferred cancellations: kind and unkind.
A kind request must be accepted by the cancelled thread to be
effective; that is, I may present myself to a cancellation point
knowing I am NOT ready to be cancelled. Instead of making myself
ready, I may preventively deny cancellation requests, so that if a
request is issued during wait, or if it was already pending, it is
NOT fulfilled even if I am put in wait.
It's getting complicated. How would an interface look like?

Pthread has:

int pthread_setcancelstate(int state, int *oldstate);
int pthread_setcanceltype(int type, int *oldtype);
void pthread_testcancel(void);

I can't find descriptions of the meaning of states and types. There is
PTHREAD_CANCEL_ENABLE and PTHREAD_CANCEL_DISABLE. What happens if a
cancel request comes while the thread has disabled cancellation, and
it later enables it? Does the pending cancellation fire then?

The type is PTHREAD_CANCEL_DEFERRED and PTHREAD_CANCEL_ASYNCHRONOUS.
Does deferred mean that cancellation is possible on particular
functions, primarily those which block, and asynchronous that it can
happen at any time (as long as it's enabled)? Where can I find the
list of functions which allow deferred cancellation?

If I understand this correctly, then I like the interface. It makes
sense to provide "brackets" which set these values around a block
of code. Should explicit setting as statements be provided too, or
brackets are sufficient? Does cancel state bracket make sense only in
one direction: to disable cancellation? I guess cancel type bracket
makes sense in both ways.

Is there a way to "clear" the pending cancellation flag for the
current thread? I tried to imagine how to implement a timeout for
blocking operations, and it seems that such clearing would help.

What happens when a wait is being cancelled: does the thread reacquire
the mutex, since the code around wait assumes that the mutex is locked?
But it might be being locked by another thread. So I don't know how it
should work at all.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Giancarlo Niccolai
2004-10-05 12:37:08 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Mutex are MEANT to be locked for a small time as a nanosecond, more
means a broken logic.
This argument doesn't apply to timed wait for a condition.
In fact! Conditions wait are meant to WAIT, mutex lock waits are meant to
ACHIEVE THE LOCK. Mutex locks are NOT meant to wait. It's exacly what I
sayed. Staying on a mutex lock for more than a few microseconds (excluding
the context switch time, of course), means that you are "waiting" to
achieve the lock; this is NOT what mutex are meant for; they are meant for
mutual exclusion, and not for wait. And of course, this does not apply to
timed wait for conditions.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
There are two types of deferred cancellations: kind and unkind.
A kind request must be accepted by the cancelled thread to be
effective; that is, I may present myself to a cancellation point
knowing I am NOT ready to be cancelled. Instead of making myself
ready, I may preventively deny cancellation requests, so that if a
request is issued during wait, or if it was already pending, it is
NOT fulfilled even if I am put in wait.
It's getting complicated. How would an interface look like?
int pthread_setcancelstate(int state, int *oldstate);
int pthread_setcanceltype(int type, int *oldtype);
void pthread_testcancel(void);
From pthread manual:

"pthread_setcancelstate" changes the cancellation state for the calling
thread -- that is, whether cancellation requests are ignored or not. The |
state| argument is the new cancellation state:
either "PTHREAD_CANCEL_ENABLE" to enable cancellation,
or "PTHREAD_CANCEL_DISABLE" to disable cancellation (cancellation requests
are ignored).

This is what I was talking about the ability of the thread to deny kind
cancellation requests. Then you have pthread_setcanceltype, which may be
synchronous or asynchronous. It's a widespread belief of this group that
asynchronous cancellation should NEVER be used UNLESS for very special
reasons. One example may be a debug feautre in a program that MUST stop an
idle thread; another example may be a calculation intensive thread that can
be demostrated not to hold any resource for a long period of time; as this
is known only by the thread itself, the ability to allow asynchronous
cancellation requests are upon the thread:

... normal ops ...
//going to do some calc intensive task
set asynchronous cancellation

... do some calc that does not require shared resources ...
... just CPU and local memory access ...

... set deferred cancellation ...
// going to tell everyone about my results.
...normal ops...

This allows to avoid doing periodic checks to pthread_testcancel() or
equivalent in the calc intensive loops, sparing some clock cycles. In real
world programs this usually does not happens, or would not help the calc
time very much, but in some math simulation it may be a common case.
Post by Marcin 'Qrczak' Kowalczyk
I can't find descriptions of the meaning of states and types. There is
PTHREAD_CANCEL_ENABLE and PTHREAD_CANCEL_DISABLE. What happens if a
cancel request comes while the thread has disabled cancellation, and
it later enables it? Does the pending cancellation fire then?
I had the same question once and Mr. Butenhof replied: the cancellation
request is left "pending". When you reenable cancellation, if the type of
cancellation is "deferred" (kind), it will be fulfilled at the first
cancellation point from that moment on, if there is one. If it's
asynchronous (unkind), it will be fulfilled "as soon as possible". In
example, an implementation may kill the thread when it gives off its time
slice. Others may be able to kill the thread in a few processor
instructions time. Consider "as soon as possible" in its widest meaning.
Post by Marcin 'Qrczak' Kowalczyk
The type is PTHREAD_CANCEL_DEFERRED and PTHREAD_CANCEL_ASYNCHRONOUS.
Does deferred mean that cancellation is possible on particular
functions, primarily those which block, and asynchronous that it can
happen at any time (as long as it's enabled)? Where can I find the
list of functions which allow deferred cancellation?
Yes, you got it ;-)

This is the list that posix says to be cancellation points (from sunos man
page):

aio_suspend(3R) for systems having AIO api (sunos),
close(2), creat(2), getmsg(2), getpmsg(2), lockf(3C),
mq_receive(3R), mq_send(3R), msgrcv(2), msgsnd(2),
msync(3C), nanosleep(3R), open(2), pause(2), poll(2),
pread(2), pthread_cond_timedwait(3T), pthread_cond_wait(3T),
pthread_join(3T), pthread_testcancel(3T), putmsg(2),
putpmsg(2), pwrite(2), read(2), readv(2), select(3C),
sem_wait(3R), sigpause(3C), sigwaitinfo(3R), sigsuspend(2),
sigtimedwait(3R), sigwait(2), sleep(3C), sync(3C),
system(3S), tcdrain(3), usleep(3C), wait(2), waitid(2) waitpid(2)
, wait3(3C), write(2), writev(2), and fcntl(2), when
specifying F_SETLKW as the command

and of course pthread_testcancel().
There may be more in future (posix standards should have already a wider
list than this).
Post by Marcin 'Qrczak' Kowalczyk
If I understand this correctly, then I like the interface. It makes
sense to provide "brackets" which set these values around a block
of code. Should explicit setting as statements be provided too, or
brackets are sufficient? Does cancel state bracket make sense only in
one direction: to disable cancellation? I guess cancel type bracket
makes sense in both ways.
Both.
Post by Marcin 'Qrczak' Kowalczyk
Is there a way to "clear" the pending cancellation flag for the
current thread? I tried to imagine how to implement a timeout for
blocking operations, and it seems that such clearing would help.
No, there's no way to clean it, but you can legally avoid fulfilling it.
Actually, setting CANCEL_DISABLE will prevent any pending flag to be
fulfilled up to the normal thread termination.
Post by Marcin 'Qrczak' Kowalczyk
What happens when a wait is being cancelled: does the thread reacquire
the mutex, since the code around wait assumes that the mutex is locked?
But it might be being locked by another thread. So I don't know how it
should work at all.
When a wait returns, both on timed wait elapse or on success, it reacquires
the lock. This may add an additional micro-time if there is contention on
the mutex, but the rule of a well written program is that normally there
should be low contention. When a condition is cancelled, the MUTEX IS
RE-ACQUIRED ANYHOW!; the thread does not just dies: it has the chance of a
clean death because its cleanup routine (see pthread_cleanup_push) is
called, and when it is entered, if the mutex is locked so that the cleanup
routine may commuincate safely with the resto of the program. As soon as
the cleanup routine returns, the thread is killed (and so the cleanup
routine MUST free the mutex). This imples that the cleanup routine MUST be
pushed when you enter a condition wait that may be cancelled, and that the
cleanup routine for non-wait cancellation points (i.e. read, writes etc.)
are usually different from the cleanup routine you have to push before
waiting for conditions.

Notice: this is a C-ish way to express the try/catch C++/C# construct, in
this case a catch ThreadInterruptedException construct. It is less elegant,
but quite effective.
Post by Marcin 'Qrczak' Kowalczyk
What happens when a wait is being cancelled: does the thread reacquire
the mutex, since the code around wait assumes that the mutex is locked?
But it might be being locked by another thread. So I don't know how it
should work at all.
I repeat the quote again to restate an important concept: MUTEX LOCK MUST BE
SHORT TIMED. i.e. acquiring locks and then calling functions is to be
avoided like plague. Acquiring locks should be done only to access data,
and not to call functions, expecially recursive functions. This is why
posix is reticent to allow recursive locks, and why the monitor model of
Java was so clumsy. C# model is better because it puts monitor locks on
data rater than on code sections (on code sections accessing particular
data, to be correct, but this is the defintion of a "theoretical" mutex, so
it's fine). Anyhow, the "candy" grammar of C# may encourage long-timed
mutex acquisitions, even if it's evidently studied for short timed ones, so
you must be careful using it.

Mutexes are NOT meant to "block other threads"; indeed this is an
undesirable side effect. Just, there is one chance over a million that two
threads are mangling the same data at the same time, and mutexes are there
to prevent this one-over-a-million situation to cause disasters (also,
mutex semantics have the desirable side effect to make local memory visible
around in multiple processor machines, but this is a "practical" effect
that is not directly related with the theoretical model, although is very
important). Since one-over-a-million means once per second in modern
computing, a program that would fail in that one-over-a-million situation
would break each second; THIS is what mutexes are for, they are NOT for
blocking threads(**), but to protect data integrity across them.

The fact is that the responsibility to use the mutexes correctly is on the
FINAL programmer, or on the "middleware" programmer, because only they are
in control of the time during which shared resources are mangled. C# sagely
returns this control on the final programmer, while Java wished to hide the
threading complexity deep into the lower level library. Anyhow, C# has
still the "latent" tendency (and puts the ability on the programmer) to
hide mutex locking at the lowest data level access: it can be done, but
EXTREMELY carefully.


Giancarlo.

(**) In fact, lock-free algos are meant to protect data from concurrent
access without blocking threads; also, spinlock waits are used in the
optimistic and often corret assumption that a few cpu cycles spent in idle
loops and checking if the mutex can be achieved are a great win over a full
"block" with relative context switch.
Marcin 'Qrczak' Kowalczyk
2004-10-05 14:21:15 UTC
Permalink
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Mutex are MEANT to be locked for a small time as a nanosecond, more
means a broken logic.
This argument doesn't apply to timed wait for a condition.
In fact! Conditions wait are meant to WAIT, mutex lock waits are meant to
ACHIEVE THE LOCK.
Does this imply that timed wait makes more sense?

I would prefer to avoid it for simplicity.
Post by Giancarlo Niccolai
It's a widespread belief of this group that asynchronous
cancellation should NEVER be used UNLESS for very special reasons.
In higher level languages it's much less dangerous than in C or C++.

And when threads are implemented in userspace, then "any time" doesn't
mean "possibly when an object has been allocated but not initialized"
or such. The runtime will always be in a consistent state, only
application data might not. With semi-functional programming there is
little mutation, so even application data is usually consistent. And
with garbage collection there is no danger of orphaning just allocated
object.

There is some kind of asynchronous exception that you can't always
prevent: stack overflow. In my implementation the stack is physically
reallocated if the previously allocated amount is insufficient (this
implies that I will be able to give really small initial stack to
threads, so lots of threads will be possible), but at some point there
is simply no more memory. Currently the program dies, but I will try
to convert it to an exception. This is not system stack but emulated
stack. System stack is kept small all the time.

This means that you should be prepared for exceptions in unusual
places. But in a high level, semi-functional language, basic exception
safety is much easier to archieve than in C++.
Post by Giancarlo Niccolai
This is the list that posix says to be cancellation points (from sunos man
aio_suspend(3R) for systems having AIO api (sunos),
close(2), creat(2), getmsg(2), getpmsg(2), lockf(3C),
mq_receive(3R), mq_send(3R), msgrcv(2), msgsnd(2),
msync(3C), nanosleep(3R), open(2), pause(2), poll(2),
pread(2), pthread_cond_timedwait(3T), pthread_cond_wait(3T),
pthread_join(3T), pthread_testcancel(3T), putmsg(2),
putpmsg(2), pwrite(2), read(2), readv(2), select(3C),
sem_wait(3R), sigpause(3C), sigwaitinfo(3R), sigsuspend(2),
sigtimedwait(3R), sigwait(2), sleep(3C), sync(3C),
system(3S), tcdrain(3), usleep(3C), wait(2), waitid(2) waitpid(2)
, wait3(3C), write(2), writev(2), and fcntl(2), when
specifying F_SETLKW as the command
Ok, they seem to concide with blocking. Are there blocking functions
which are not cancellation points? I see pthread_mutex_lock is not on
the list, and indeed X/Open doesn't say it's a cancellation point.
Anything else?
Post by Giancarlo Niccolai
and of course pthread_testcancel().
I think you once said that pthread_testcancel() cancels a thread even
if cancellability is disabled, where X/Open says that it does not.
Which would be better?
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
If I understand this correctly, then I like the interface. It makes
sense to provide "brackets" which set these values around a block
of code. Should explicit setting as statements be provided too, or
brackets are sufficient? Does cancel state bracket make sense only in
one direction: to disable cancellation? I guess cancel type bracket
makes sense in both ways.
Both.
So, which of the following should be provided?

1. SetCancelState state // returns the old state
2. EnableCancel {code}
3. DisableCancel {code}
4. SetCancelType type // returns the old type
5. DeferredCancel {code}
6. AsynchronousCancel {code}

X/Open says:

"First, cancelability should only be disabled on entry to an object,
never explicitly enabled. On exit from an object, the cancelability
state should always be restored to its value on entry to the object."

This makes 2 is suspicious: if the surrounding context doesn't want
cancellation, it should not be enabled locally. But since in my case
cancellation will be translated to exceptions, enabling it locally can
be done safely.

GHC has 5 and 6 (under names block and unblock), it doesn't have 4. Of
course in C and C++ you can't have variants with a block of code, so
you must have 1 and 4. Do I need 1 and 4 if I have the others? Which of
these six should be provided?

I have mutex locking both ways: there is both Lock mutex {code},
the preferred way, and explicit Lock and Unlock, in case they are
not performed at the same place. I've read somewhere that it's good
that .NET provides explicit Monitor.Enter and Monitor.Exit, because
sometimes lock {code} is not sufficient. Am I right in providing
Lock in both flavors?
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Is there a way to "clear" the pending cancellation flag for the
current thread? I tried to imagine how to implement a timeout for
blocking operations, and it seems that such clearing would help.
No, there's no way to clean it, but you can legally avoid fulfilling
it. Actually, setting CANCEL_DISABLE will prevent any pending flag
to be fulfilled up to the normal thread termination.
Well, but it eventually happen when it's enabled again and the thread
waits for something.

So, how to implement I/O with a timeout in terms of primitives we are
talking about? The basic idea goes probably like this:

let current = CurrentThread;
let timeout = Thread {
Sleep time;
CancelThread current
};
ReadBlock file buffer blockSize;
CancelThread timeout;

but this has a race condition: the timeout can elapse when the main
thread is between ReadBlock and CancelThread, and the cancellation by
the timeout thread will be left pending. How to fix this?

Of course another problem is that we don't distinguish between the
timeout thread and some other thread cancelling us, but it's easy to
fix (the timeout thread will set a shared variable before cancelling).
So let's assume that nobody else is cancelling us.
Post by Giancarlo Niccolai
When a condition is cancelled, the MUTEX IS RE-ACQUIRED ANYHOW!;
Ok. And just after it acquires it, the exception is thrown, because
this is a natural way to express cancellation in languages like mine.
When the exception is not caught near the wait, the stack will unwind,
the mutex will be usually unlocked because a well-behaving code will
use Lock mutex {code} which ensures that it's unlocked on exit from
the code, like try...finally in some languages.
Post by Giancarlo Niccolai
This is why posix is reticent to allow recursive locks, and why the
monitor model of Java was so clumsy.
Ah, recursive locking. What do you recommend?

As I see, Posix provides all options, including undefined behavior.
I will not have undefined behavior, which leaves the following options:

1. No special handling of this case, i.e. deadlock.
2. An exception is thrown to signal a program error.
3. A counter is increased etc. like in Java and C#.
4. A way to choose one of the above (or of the subset of the above).

I would strongly prefer to avoid 4. Which should I choose?

While we are at deadlocks, there are cases when a deadlock can be
detected, with the help of the garbage collector. Reachability of data
can be extended to threads; threads which are running or threads which
wait for external resources like blocking I/O are alive by themselves,
but stopped threads are kept alive by concurrency objects which are
able to eventually wake them up. When a mutex or condition on which
a thread is sleeping is unreachable by running or potentially running
threads, we know that this thread will never be woken up. What now?

1. Silently garbage collect it.
2. Throw an exception in this thread to signal a program error. When
the exception is propagated to thread toplevel, it will probably be
printed with a stack trace, so the programmer can fix that.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Giancarlo Niccolai
2004-10-05 16:20:08 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Mutex are MEANT to be locked for a small time as a nanosecond, more
means a broken logic.
This argument doesn't apply to timed wait for a condition.
In fact! Conditions wait are meant to WAIT, mutex lock waits are meant to
ACHIEVE THE LOCK.
Does this imply that timed wait makes more sense?
Well, they do make sense, indeed. They are probably not vital, but it's
highly recommended you have one.
Post by Marcin 'Qrczak' Kowalczyk
I would prefer to avoid it for simplicity.
Post by Giancarlo Niccolai
It's a widespread belief of this group that asynchronous
cancellation should NEVER be used UNLESS for very special reasons.
In higher level languages it's much less dangerous than in C or C++.
I'm sorry, but it is EXACTLY the other way around. In higher level language
is MUCH more dangerous than in C or C++, exactly because the higher level
Post by Marcin 'Qrczak' Kowalczyk
And when threads are implemented in userspace, then "any time" doesn't
mean "possibly when an object has been allocated but not initialized"
or such. The runtime will always be in a consistent state, only
application data might not. With semi-functional programming there is
little mutation, so even application data is usually consistent. And
with garbage collection there is no danger of orphaning just allocated
object.
At "any time may" be (with 90% of incidence) exactly in the middle of a
opcode evaluation.

If you mean to use a "emulated" asynchronous cancellation, that is use a
physical deferred cancellation that is checked at i.e. each VM loop,
"seeming" like async cancellation to the final programmer, that's a little
better, but I would say just a little. In example, the target thread may be
in charge to issue a signal for a waiting thread, or be in charge to
produce some consistent output, like i.e. a log entry. Killing it abruptly
(even if the VM is in a consistent state) means that it will fail to notify
the other waiting thread, or to complete the log entry writing (doing i.e.
half of it). So, your program may not crash, but it is anyhow at risk of
deadlocks and undefined behavior.
Post by Marcin 'Qrczak' Kowalczyk
There is some kind of asynchronous exception that you can't always
prevent: stack overflow.
They occur INSIDE the thread that is going to terminate. Asynchronous
cancellations means that thread A just kills B and B just gets killed
without any recovery hope.
Post by Marcin 'Qrczak' Kowalczyk
In my implementation the stack is physically
reallocated if the previously allocated amount is insufficient (this
implies that I will be able to give really small initial stack to
threads, so lots of threads will be possible), but at some point there
is simply no more memory. Currently the program dies, but I will try
to convert it to an exception. This is not system stack but emulated
stack. System stack is kept small all the time.
Exactly: the stack exception is synchronous, in this case: it's generated
inside the thread that must handle the exception,
Post by Marcin 'Qrczak' Kowalczyk
This means that you should be prepared for exceptions in unusual
places. But in a high level, semi-functional language, basic exception
safety is much easier to archieve than in C++.
It's just a false safety feeling: your program won't crash, it just will do
wrong in a smaller degree. Doing "a little less wrong" things must not be a
design goal; mathematical correctness is the only thing that can keep a
threading app to run.
Post by Marcin 'Qrczak' Kowalczyk
Ok, they seem to concide with blocking. Are there blocking functions
which are not cancellation points? I see pthread_mutex_lock is not on
the list, and indeed X/Open doesn't say it's a cancellation point.
Anything else?
pthread_mutex_lock is not a blocking function. It is not meant to block your
program.
Post by Marcin 'Qrczak' Kowalczyk
I think you once said that pthread_testcancel() cancels a thread even
if cancellability is disabled, where X/Open says that it does not.
Which would be better?
No, I didn't say that.
Post by Marcin 'Qrczak' Kowalczyk
So, which of the following should be provided?
1. SetCancelState state // returns the old state
2. EnableCancel {code}
3. DisableCancel {code}
4. SetCancelType type // returns the old type
5. DeferredCancel {code}
6. AsynchronousCancel {code}
I suggest that the default thread is created with deferred cancellation
enabled. So you would need:
DisableCancel {}
AsynchronousCancel{}
Post by Marcin 'Qrczak' Kowalczyk
I have mutex locking both ways: there is both Lock mutex {code},
the preferred way, and explicit Lock and Unlock, in case they are
not performed at the same place. I've read somewhere that it's good
that .NET provides explicit Monitor.Enter and Monitor.Exit, because
sometimes lock {code} is not sufficient. Am I right in providing
Lock in both flavors?
Yes. But please, don't do as M$ does: they mangle the common names to make
less informed ppl thing they invented things. It's not a monitor. Is a
mutex. Mutex.Lock. This is a monitor construct:

Lock(object) {
}

and that is a monitor because it automatically monitorizes Object with a
mutex.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Is there a way to "clear" the pending cancellation flag for the
current thread? I tried to imagine how to implement a timeout for
blocking operations, and it seems that such clearing would help.
No, there's no way to clean it, but you can legally avoid fulfilling
it. Actually, setting CANCEL_DISABLE will prevent any pending flag
to be fulfilled up to the normal thread termination.
Well, but it eventually happen when it's enabled again and the thread
waits for something.
But you may also never re-enable it.
Post by Marcin 'Qrczak' Kowalczyk
So, how to implement I/O with a timeout in terms of primitives we are
let current = CurrentThread;
let timeout = Thread {
Sleep time;
CancelThread current
};
ReadBlock file buffer blockSize;
CancelThread timeout;
but this has a race condition: the timeout can elapse when the main
thread is between ReadBlock and CancelThread, and the cancellation by
the timeout thread will be left pending. How to fix this?
This is not a race condition. If the cancellation is late, it is just late
and nothing is broken. Either the read will be done before timeout fires,
or it won't be killed and will kill in turn the timeout. Of course, the
timeout may have already fired (in late) the current thread, but this is
fine. It's the logic of THIS program that would not be an example of good
programming, while the primitives work right.
Post by Marcin 'Qrczak' Kowalczyk
Ah, recursive locking. What do you recommend?
They are a must; they are Evil, and prominent Evil, but some program may be
well written even if using it. I don't use them, personally, and I
discourage everyone from using them, but if they are well used, they can
work.
Post by Marcin 'Qrczak' Kowalczyk
As I see, Posix provides all options, including undefined behavior.
1. No special handling of this case, i.e. deadlock.
2. An exception is thrown to signal a program error.
3. A counter is increased etc. like in Java and C#.
4. A way to choose one of the above (or of the subset of the above).
I would strongly prefer to avoid 4. Which should I choose?
You can't catch all the programmer stupidity. Some of them is just
irreducible. So, go for 1 and document it well: "if you do this thing, you
need a brain exchange" (like i.e. the self cancelling race above: it is
good to show the usage of the primitives, and works, but it's very bad as a
program).
Post by Marcin 'Qrczak' Kowalczyk
While we are at deadlocks, there are cases when a deadlock can be
detected, with the help of the garbage collector. Reachability of data
can be extended to threads; threads which are running or threads which
wait for external resources like blocking I/O are alive by themselves,
but stopped threads are kept alive by concurrency objects which are
able to eventually wake them up. When a mutex or condition on which
a thread is sleeping is unreachable by running or potentially running
threads, we know that this thread will never be woken up. What now?
Some deadlocks can be detected. Other not. I would not risk to try to detect
all the mess a programmer can do; you'll end up to do it right, but
limiting the freedom of the programmer an thus the usefulness of the
language. Then your language may end up in being a nice toy language.

Well, this is a personal opinion, anyhow, there is no theoretical or
mathematical reason why you would not be able to detect and correct
deadlocks. It theoretically works: just I won't use it, but others might.

Giancarlo Niccolai.
Marcin 'Qrczak' Kowalczyk
2004-10-05 17:50:59 UTC
Permalink
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Does this imply that timed wait makes more sense?
Well, they do make sense, indeed. They are probably not vital,
but it's highly recommended you have one.
Ok, I will add it (but after I got it working at all).
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
In higher level languages it's much less dangerous than in C or C++.
I'm sorry, but it is EXACTLY the other way around. In higher level
language is MUCH more dangerous than in C or C++, exactly because
the higher level developer has less control over low level
If threads are scheduled by the OS, then it's hard to implement a high
level language at all. It is possible, but thread-safe garbage collection
is very hard. That is, if we want it to be efficient; obtaining a lock
for each allocation is not an option. It also rules out global variables
used internally by the runtime, if they aren't truly global for all
threads.

But I was talking about threads implemented in userspace, where a
timer tick forces a reschedule in the next safe point (in my case it's
going to be when a stack frame is pushed), and blocking I/O is emulated
by non-blocking I/O and a central poll/select loop in the scheduler.
Post by Giancarlo Niccolai
pthread_mutex_lock is not a blocking function. It is not meant to
block your program.
Technically it does block, even if it's meant to be a short time.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
I think you once said that pthread_testcancel() cancels a thread even
if cancellability is disabled, where X/Open says that it does not.
Which would be better?
No, I didn't say that.
Ok, so TestCancel() will do something (throw an exception) only if
cancellation is enabled.
Post by Giancarlo Niccolai
I suggest that the default thread is created with deferred
cancellation enabled.
I was planning exactly that, as Posix does.
Post by Giancarlo Niccolai
DisableCancel {}
AsynchronousCancel{}
Ok, these are needed for sure, because they turn on the non-default
choices. Of course they will nest properly, and when cancellation is
disabled then the choice of deferred vs. asynchronous doesn't matter.

Are you sure that the rest is not needed? Well, ok.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Am I right in providing Lock in both flavors?
Yes. But please, don't do as M$ does: they mangle the common names to make
less informed ppl thing they invented things. It's not a monitor. Is a
Lock(object) {
}
Sorry, the syntax of my language requires the function name to go first.
Or you could say mutex->Lock {...}, it's the same. But this is a named
function applied to two arguments: a mutex and a block of code
(function closure).

Here are the names I'm planning (lowercase words are placeholders for
expressions). Any better name choices? Any fundamental operations missing?
Any mistakes?

Thread {code} // start a thread, return thread ID
JoinThread thread
CurrentThread // behaves like a thread-local variable

Mutex() // make a mutex
Lock mutex {code}
Lock mutex
Unlock mutex
TryLock mutex

Condition mutex // make a condition
Wait condition
TimedWait condition time
Signal condition // broadcast
Signal1 condition

CancelThread thread
DisableCancel {code}
AsynchronousCancel {code}
TestCancel()

Yield() // give some CPU to other threads
Sleep time

I guess CancelThread should return immediately, even if it needs some
time to reach a cancellation point?

There are no detached threads because of garbage collection. If you
forget about a thread ID, and it terminates, then it vanishes. Any
number of threads can join a thread. They can join it even after
it has terminated. If it terminated with an unhandled exception,
JoinThread will throw the same exception.

I'm not sure what should happen with unhandled exceptions in threads.
Note that an unhandled exception in the main thread runs a global
handler which by default dumps a stack trace on stderr. But what about
exceptions in threads? If they are just recorded for whoever does
JoinThread, then they will be lost silently if nobody does JoinThread.
OTOH dumping a stack trace when the program as a whole is not going to
die seems silly. Should there be thread-local handler, similarly for
the handler for the main thread? What it should do by default?

Another problem. What should the program do when the main thread has
finished but some other threads have not?
1. Wait for them to finish (or wait forever if they deadlocked).
2. Just kill them at once. The programmer should avoid this.
3. Cancel them as if with CancelThread, wait for them to finish.
4. Try to let the program go into the background by doing fork(). This
has bad side effects, e.g. pid is changed, and children processes
are lost, but maybe it's the closest approximation of what the
programmer had in mind.

Another issue: Unix signals. Of course I, the implementor, must handle
them specially, because too little is physically possible to be
executed from them directly. But what should be the effect? Does it
make sense to start a new thread for a signal?
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Well, but it eventually happen when it's enabled again and the thread
waits for something.
But you may also never re-enable it.
But it wants to re-enable it without side effects.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
let current = CurrentThread;
let timeout = Thread {
Sleep time;
CancelThread current
};
ReadBlock file buffer blockSize;
CancelThread timeout;
but this has a race condition: the timeout can elapse when the main
thread is between ReadBlock and CancelThread, and the cancellation by
the timeout thread will be left pending. How to fix this?
This is not a race condition. If the cancellation is late, it is just late
and nothing is broken. Either the read will be done before timeout fires,
or it won't be killed and will kill in turn the timeout.
It is a race condition. There is a small but non-zero probability that
the timeout fires after ReadBlock completed, but before CancelThread.
So it finds the thread not being blocked, and marks the cancellation
as pending. The timeout thread is killed (it just finished anyway),
the code continues, and in some random point in the future, the first
time it waits for something, the pending cancellation triggers and
the thread is cancelled.

The only way to implement this correctly seems to be to add at the end:

try {
TestCancel()
}
(when Is THREAD_CANCEL) {};

to clear the pending cancellation flag. But it looks silly. Is there
a better way of emulating I/O with timeout? This is a general problem:
substitute any other activity for I/O.
Post by Giancarlo Niccolai
You can't catch all the programmer stupidity. Some of them is just
irreducible. So, go for 1 and document it well: "if you do this
thing, you need a brain exchange" (like i.e. the self cancelling
race above: it is good to show the usage of the primitives, and
works, but it's very bad as a program).
Ok, maybe someone will kill me, but I will make mutexes non-recursive.
Locking it again will deadlock this thread.

Unlocking a mutex which is not locked, or is locked by some other
thread, throws an exception. As a principle I avoid undefined behavior
if it can be checked in O(1) time. It's all not super-fast anyway.
It's not very bad either: the compile is compiled into C, not
interpreted.
Post by Giancarlo Niccolai
Some deadlocks can be detected. Other not.
Sure. But for detected deadlocks there comes a question whether
throwing an exception and allowing the code to report the bug
gracefully is good, or maybe it's not a bug but the programmer for
some silly reason intended it to deadlock and doing anything else
would be incorrect?
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Giancarlo Niccolai
2004-10-06 23:48:05 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Sorry, the syntax of my language requires the function name to go first.
Or you could say mutex->Lock {...}, it's the same. But this is a named
function applied to two arguments: a mutex and a block of code
(function closure).
That's actually even better.
Post by Marcin 'Qrczak' Kowalczyk
Condition mutex // make a condition
Wait condition
Wait condition, mutex // you need a mutex to do it clean.
Post by Marcin 'Qrczak' Kowalczyk
TimedWait condition time
Signal condition // broadcast
Signal1 condition
CancelThread thread
DisableCancel {code}
AsynchronousCancel {code}
TestCancel()
Yield() // give some CPU to other threads
A bad idea; It would be a good idea if you had coroutining (that is, totally
emulated threading via VM abstraction). Are you using system API or are you
"emulating" threading via VM opcount slices? In this second case, then
Yield is good.
Post by Marcin 'Qrczak' Kowalczyk
Sleep time
I guess CancelThread should return immediately, even if it needs some
time to reach a cancellation point?
Yes there's JoinThread to wait for a cancelled thread to terminate, if
needed.
Post by Marcin 'Qrczak' Kowalczyk
There are no detached threads because of garbage collection. If you
forget about a thread ID, and it terminates, then it vanishes. Any
number of threads can join a thread. They can join it even after
it has terminated. If it terminated with an unhandled exception,
JoinThread will throw the same exception.
Ok.
Post by Marcin 'Qrczak' Kowalczyk
I'm not sure what should happen with unhandled exceptions in threads.
Note that an unhandled exception in the main thread runs a global
handler which by default dumps a stack trace on stderr. But what about
exceptions in threads? If they are just recorded for whoever does
JoinThread, then they will be lost silently if nobody does JoinThread.
OTOH dumping a stack trace when the program as a whole is not going to
die seems silly. Should there be thread-local handler, similarly for
the handler for the main thread? What it should do by default?
I would see better a VM related error handler; you want to have another
handler? then have another VM. All the threads handled in a VM should die
the same way they should not be allowed to override the VM ability to
handle them.
Post by Marcin 'Qrczak' Kowalczyk
Another problem. What should the program do when the main thread has
finished but some other threads have not?
1. Wait for them to finish (or wait forever if they deadlocked).
2. Just kill them at once. The programmer should avoid this.
3. Cancel them as if with CancelThread, wait for them to finish.
4. Try to let the program go into the background by doing fork(). This
has bad side effects, e.g. pid is changed, and children processes
are lost, but maybe it's the closest approximation of what the
programmer had in mind.
I would go for 2: the programmer should avoid this IF IT DOES NOT WANT THIS.
It may indeed be an easy and safe way to span some threads, wait some time
and kill safely and surely all of them. It's easy (or even necessary) to
join all the threads if the main thread wishes; if it does not, just let
the process die. A fork can be done explicitly, so as wait, fork and cancel
are option that the programmer can implement by itself (2) is left so (2)
should be done.
Post by Marcin 'Qrczak' Kowalczyk
Another issue: Unix signals. Of course I, the implementor, must handle
them specially, because too little is physically possible to be
executed from them directly. But what should be the effect? Does it
make sense to start a new thread for a signal?
No. It make sense to start a thread to handle ALL the signals. I did a
similar thing for xharbour language; you may wish to look here:

http://cvs.sourceforge.net/viewcvs.py/xharbour/xharbour/source/vm/thread.c?rev=1.177&view=auto
http://cvs.sourceforge.net/viewcvs.py/xharbour/xharbour/source/rtl/hbserv.c?rev=1.21&view=markup

The second file handles fork and signal masking.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Well, but it eventually happen when it's enabled again and the thread
waits for something.
But you may also never re-enable it.
But it wants to re-enable it without side effects.
If there aren't cancellation points up to the end of the thread, then there
aren't side effect.

Else, you should implement "kind cancellation" by setting a flag in the
thread data, a flag that would be read by the VM and would then terminate
the thread.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
This is not a race condition. If the cancellation is late, it is just
late and nothing is broken. Either the read will be done before timeout
fires, or it won't be killed and will kill in turn the timeout.
It is a race condition. There is a small but non-zero probability that
the timeout fires after ReadBlock completed, but before CancelThread.
This does not configure a race condition. A race condition implies that the
race must be destructive in some way: causing data loss or deadlock. In
your case none of this happens, this is also referred as "harmless race
condition".
Post by Marcin 'Qrczak' Kowalczyk
So it finds the thread not being blocked, and marks the cancellation
as pending. The timeout thread is killed (it just finished anyway),
the code continues, and in some random point in the future, the first
time it waits for something, the pending cancellation triggers and
the thread is cancelled.
But this is the implied behavior of the code you wrote. It's, so to speak, a
correct behavior.
Post by Marcin 'Qrczak' Kowalczyk
try {
TestCancel()
}
(when Is THREAD_CANCEL) {};
to clear the pending cancellation flag. But it looks silly. Is there
substitute any other activity for I/O.
As I said, do-it-yourself: provide a kind of VM driven thread cancellation
request.
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
You can't catch all the programmer stupidity. Some of them is just
irreducible. So, go for 1 and document it well: "if you do this
thing, you need a brain exchange" (like i.e. the self cancelling
race above: it is good to show the usage of the primitives, and
works, but it's very bad as a program).
Ok, maybe someone will kill me, but I will make mutexes non-recursive.
Locking it again will deadlock this thread.
No one will kill you: this is the preferred choice here.
Post by Marcin 'Qrczak' Kowalczyk
Sure. But for detected deadlocks there comes a question whether
throwing an exception and allowing the code to report the bug
gracefully is good, or maybe it's not a bug but the programmer for
some silly reason intended it to deadlock and doing anything else
would be incorrect?
No, a deadlock is just an error. So if you can detect it and throw an
exception, do it.

I just suggest you to provide two kind of exceptions: stoppable and
unstoppable. The latter may be only caught by the VM, that would pass it to
a application level (not script level) handler or print a stack trace and
exit if the handler has not been set. The script should not be allowed to
catch some exceptions that are outside its recovery ability: a script can't
recover from a deadlock.

Anyhow, i.e. python does not agree as it catches even syntax errors while
compiling... this allow to build powerful handlers in the scripts to i.e.
send a mail with python routines if an hard error arises. I don't like this
very much, as IMHO it hides more problems than the ones it solves.

....

Please, before replying have also a look at my last article:

http://www.niccolai.ws/works/articoli/art-multithreading-en-1a.html

.....

Finally, how much is advanced your language? Incidentally, I am developing
Haste at

http://www.haste.ws

And, of course, one of its strength has to be MT. I am searching for
cooperation, and as you have the same interest in making languages we may
find an agreement ;-).

I have not written the RTL yet, still writing the "perfect" GC, but you may
use it also as a reference.

Bests,
Giancarlo.
Marcin 'Qrczak' Kowalczyk
2004-10-07 11:03:12 UTC
Permalink
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Condition mutex // make a condition
Wait condition
Wait condition, mutex // you need a mutex to do it clean.
The mutex is implicit in the condition, so I don't see a point in
forcing the user to redundantly specify it. There is always only one
value of the argument which makes sense.

An exception will be thrown when the mutex is not held by this thread,
so mistakes in using a wrong mutex should be detected (except when the
other mutex is locked too, but I don't believe it's so common mistake
to warrant another function argument).
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Yield() // give some CPU to other threads
A bad idea; It would be a good idea if you had coroutining (that is,
totally emulated threading via VM abstraction). Are you using system
API or are you "emulating" threading via VM opcount slices? In this
second case, then Yield is good.
I am "emulating" threading (I think I've told this). I have to do this
because:
1. Multithreaded garbage collection is hard.
2. The runtime is using global variables for performance-critical tasks
(e.g. argument passing), making them thread-local would be bad.

And it's easy in my case, because I internally use continuation-passing
style and custom stack to have proper tail calls and precise GC. The
timer asks for a context switch by faking a stack overflow condition.
The code, which checks for stack overflow anyway, detects that it was
a request for context switch and proceeds accordingly.

I'm here breaking strict ISO/ANSI C rules: the signal handler modifies
global volatile variables of pointer type, not just sig_atomic_t.
I hope it works on all interesting architectures. It would be too
painful to do otherwise because it would require an explicit check.

The only cost is that I had to disable the optimization of allocating
the stack frame as late as possible. The optimization allowed it to
be avoided altogether in some branches of control flow (it is often
possible to do without a stack frame if we aren't calling other
functions in non-tail-call positions, because temporary data can be
held in virtual registers emulated in global variables, or even in
plain local C variables if GC can't happen during working with them).

Unfortunately not all places where it got allocated were safe for GC,
so they were not safe for a context switch either. And it would
inconvenient to restart in the middle of a C function, it would have
to be split at this point (it's split anyway in lots of places, but
not necessarily when the stack frame is allocated). With my virtual
register allocation algorithm it was too hard to ensure that the stack
frame is allocated in safe places only, so now I allocate it at the
beginning of each function which needs it at all. This caused a
slowdown by 3% (measuring the compiler itself).

My runtime is similar to a VM in that it uses virtual registers,
custom stack, and custom calling convention, but the code is compiled
into C, there are no opcodes for VM instructions.

I know advantages of native threads: that blocking I/O is executed
in parallel without having to emulate it using non-blocking I/O, that
multiple processors are utilized by a single process, and that context
switches don't need to be delayed until a safe point (which will
generally happen very soon, but it may be long in case of long
functions implemented in C, for examble bignum arithmetic is atomic
wrt. context switches). Anyway, I have little choice. I consider both
approaches viable, with different strengths (I would guess my threads
are more lightweight than OS threads).
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
I'm not sure what should happen with unhandled exceptions in
threads. Note that an unhandled exception in the main thread runs a
global handler which by default dumps a stack trace on stderr. But
what about exceptions in threads? If they are just recorded for
whoever does JoinThread, then they will be lost silently if nobody
does JoinThread. OTOH dumping a stack trace when the program as a
whole is not going to die seems silly. Should there be thread-local
handler, similarly for the handler for the main thread? What it
should do by default?
I would see better a VM related error handler; you want to have
another handler? then have another VM. All the threads handled in a
VM should die the same way they should not be allowed to override
the VM ability to handle them.
Sorry, I don't understand.

First, it is possible to override the global error handler, and I see
nothing wrong with that. For example it makes sense for a CGI program
to dump the message and stack trace to the HTML page it produces. This
handler is simply a function held in a global variable (variable on
the Kogut side, not C side).

With an unhandled toplevel exception the program is going to
eventually die no matter what the handler does, but the handler can
choose a better medium for communicating the message than stderr,
or format the stack trace differently.

The stack trace is available only in that handler, but not in ordinary
'try', because it's materialized only when needed. An exception which
is caught is very fast. This means that 'try' around the bulk of the
program is not a substitute for the toplevel error handler.

The function ExitProgram generates an exception which can be caught
in the normal way; the main purpose is for Finally clauses to execute.
This exception is special in one way: it does not cause materializing
the stack trace and calling the toplevel error handler. If it reaches
the toplevel, the program is just terminated.

Now we have threads. Thread cancellation is similar to exiting the
program, only on the thread level, and it differs that it may be
generated from outside. So this exception should just terminate the
thread silently, after it has been propagated to *its* toplevel.

But what about other exceptions? If they are not caught in the thread,
they usually are bugs or I/O errors which have not been checked.
If someone joins the thread, he will see the exception; if this is the
main thread and it does not catch exceptions thrown by JoinThread,
the exception will terminate the whole program with a stack trace dump
(this definitely shows a bug in the program: it *should* have been
caught at some point, either in the thread or after joining it).

But if nobody joins it, the exception will be silently discarded.
Errors should not be discarded silently! So, what to do with them?
Having a similar thread-local handler which prints them on stderr
by default (except thread cancellation) has two problems:

- It may be printed in the middle of the computation (what is worse,
StdErr is currently not thread-safe, because it's buffered on the
Kogut side using pure Kogut code, not atomic C code).

- If I choose to propagate it to JoinThread after it has been printed
on StdErr, it will be printed twice if not caught in the main thread.
This makes no sense.

Options I imagine:

1. Currently it's just recorded in the thread state and propagated by
JoinThread. Leave it this way.

2. Have a thread-local toplevel error handler, but which defaults to
rethrowing the exception, with the effect like above. Somebody can
install a handler which dumps error information to StdErr instead.

3. Like 2, but reverse the default: it's usually printed with a stack
trace. Problem: what JoinThread should return? If it's some normal
value like Null, it may confuse the computation which expects
something different. If it rethrows the exception, it will be often
reported twice.

Anything else? If I see no other way, I will leave it as it is. This
means that the error is silently discarded if nobody does JoinThread.
One can insert an explicit 'try' around the thread body and do
something else, he only can't obtain a stack trace.

Note that (when I implement thread cancellation) the exception to
cancel a thread will be propagated like others. This means that if
a thread has been cancelled, and another thread performs JoinThread
without catching exceptions, it will be cancelled as well. I guess
this is fine.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Another problem. What should the program do when the main thread has
finished but some other threads have not?
1. Wait for them to finish (or wait forever if they deadlocked).
2. Just kill them at once. The programmer should avoid this.
3. Cancel them as if with CancelThread, wait for them to finish.
4. Try to let the program go into the background by doing fork().
This has bad side effects, e.g. pid is changed, and children
processes are lost, but maybe it's the closest approximation
of what the programmer had in mind.
I would go for 2: the programmer should avoid this IF IT DOES NOT WANT THIS.
This is indeed what happens when I do nothing special about it.

I feel uneasy about abrupt termination of computation, without even
running Finally clauses, but OK - the programmer can indeed program
other behaviors by hand.

If the main thread is terminated by an uncaught exception, it makes
sense to report the error and abort, even if other threads are doing
some work.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Another issue: Unix signals. Of course I, the implementor, must handle
them specially, because too little is physically possible to be
executed from them directly. But what should be the effect? Does it
make sense to start a new thread for a signal?
No. It make sense to start a thread to handle ALL the signals. I did a
http://cvs.sourceforge.net/viewcvs.py/xharbour/xharbour/source/vm/thread.c?rev=1.177&view=auto
http://cvs.sourceforge.net/viewcvs.py/xharbour/xharbour/source/rtl/hbserv.c?rev=1.21&view=markup
The second file handles fork and signal masking.
Digging into a complex code will take me some time. I hope you don't
mind if we talk about indented effects before I dig into this
implementation.

So you are saying that it should behave analogously to C, i.e.
temporarily suspend the computation of that thread until the signal
handler returns? OK, the programmer can start a thread from the
handler himself if it wants to wait for some condition or such.

How does the programmer designate the thread? By execution of some
function ("I want to handle signals") in this thread? It defaults to
the main thread of course.

The handler will be physically started at a safe point, which in my
case means when a context switch may occur: at the beginning of any
function which allocates a stack frame. I think I can let it behave as
if it was an asynchronous function call, i.e. the handler can return
(its result is ignored and computation resumes) or throw an exception
(it is thrown into the interrupted code). So ExitProgram from the
handler will behave in the expected way.

What if the thread which handles signals is blocked on a condition
variable? If I don't do anything special, it will handle the signal
after it is unblocked, i.e. after the condition is satisfied, and
until that time it's just recorded somewhere as pending.

When the signal arrives several times before we have a chance to
handle it, should the handler be run multiple times or just once?
Or maybe I should physically mask it as soon as it arrives, and
unblock before or after the signal handler, so I don't have to worry
and the OS will choose whether to fire it again when it's unblocked?
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
But it wants to re-enable it without side effects.
If there aren't cancellation points up to the end of the thread,
then there aren't side effect.
Let's assume that there are.
Post by Giancarlo Niccolai
Else, you should implement "kind cancellation" by setting a flag in
the thread data, a flag that would be read by the VM and would then
terminate the thread.
I'm not talking about implementation but intended behavior. Since I
implement threads myself, the cancellation will be as kind as needed.
It will be terminated in a safe point according to cancellation rules
(during a blocking operation by default, which can be temporarily
changed to "never" or "any time").

But this doesn't help. The cancellation policy is the default in this
case, in order to interrupt blocking I/O (of course it's non-blocking
I/O under the hood, it's not the point). So it will interrupt the read,
but possibly it will interrupt something else if it fires too late.
I want to interrupt the read, but not anything later.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
It is a race condition. There is a small but non-zero probability that
the timeout fires after ReadBlock completed, but before CancelThread.
This does not configure a race condition. A race condition implies
that the race must be destructive in some way: causing data loss or
deadlock.
It is destructive: it causes to terminate a thread that I don't want
to be terminated.
Post by Giancarlo Niccolai
But this is the implied behavior of the code you wrote. It's, so to
speak, a correct behavior.
The code was deliberately wrong to show what I'm trying to archieve
and what the problem is. I know that this is implied - that's the
problem! I want to implement a timeout for an I/O operation in terms
of the threading primitives my language will provide. The thread which
wants to perform this timed I/O must not permanently change its state
to "cancellation is pending". It must either return with the answer
"the I/O occured on time, here is what was read" or "the I/O could not
be performed before the timeout".

I can distinguish the effect of the read (because it's being read into
an array which knows how much of it is filled), but I don't know how
to safely interrupt the read without the risk of interrupting some
random blocking operation that will be performed later.
Post by Giancarlo Niccolai
As I said, do-it-yourself: provide a kind of VM driven thread
cancellation request.
I'm asking how to *use* these kind cancellation requests to implement
timed I/O.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Sure. But for detected deadlocks there comes a question whether
throwing an exception and allowing the code to report the bug
gracefully is good, or maybe it's not a bug but the programmer for
some silly reason intended it to deadlock and doing anything else
would be incorrect?
No, a deadlock is just an error. So if you can detect it and throw
an exception, do it.
I changed my mind here.

Erlang doesn't provide mutable data structures. But they can be
emulated with threads with local state if somebody needs them.

Similarly in my Kogut it will be possible to implement data structures
by threads which are listening to messages, even though of course it's
not needed to have plain mutable variables. When someone nevertheless
makes an "object" with an associated thread, it should behave sanely.

Now, what should happen if he makes such object, uses it, then forgets
about it? From the point of view of the user of such object it's an
ordinary garbage collectible object. He might even not know that it's
implemented using a thread internally. The thread will be left blocked
on a condition variable inside some message queue structure, and the
runtime will at some point determine that all this is garbage because
the condition variable is not reachable by any running or potentially
running thread.

So the thread should not be disturbed by being wakened back into life
and trying to respond to an exception. It's just garbage collected.
Now I'm sure that this is right.

A separate case is a total deadlock when all threads are blocked on
mutexes, conditions and joining, there are no threads running nor
blocked on I/O or timeout, so there is no chance that anything
will change. It happens that this would break the invariant of the
scheduler that the ring of running and potentially running threads is
non-empty. So I have little choice but to abort the program with an
error message written to stderr, and this is not recoverable because
there is no thread which could handle the recovery.
Post by Giancarlo Niccolai
I just suggest you to provide two kind of exceptions: stoppable and
unstoppable. The latter may be only caught by the VM, that would
pass it to a application level (not script level) handler or print a
stack trace and exit if the handler has not been set. The script
should not be allowed to catch some exceptions that are outside its
recovery ability: a script can't recover from a deadlock.
There is no application vs. script. A program in Kogut is compiled to
an ordinary executable, and it's the only thing being run.

At some time it will be possible to embed it in some other program,
but currently there is no interpreter to execute code which has just
been processed. It's being compiled into C, a C compiler is called
to compile it, and its assembler output is mangled to make function
calls faster (tail calls require a different calling convention).

Unfortunately the generated code is quite big, so the executables
are big (I don't use dynamically loaded libraries yet) and 90% of
the compilation time is used by gcc.
Post by Giancarlo Niccolai
http://www.niccolai.ws/works/articoli/art-multithreading-en-1a.html
You say: "Service parallelism: when the target application has to
fulfill some service duty that may be invoked by several sources at
the same time, using parallelism improves performances. We'll see in
the next articles that the correct model when implementing service
parallelism is not that of creating an agent for every request that
must be evaded. This solution (one task, one agent) would seem
natural, but in reality this would be a sure way to destabilize the
application and reduce the overall system performance, unless it can
be demonstrated that there is a maximum (low) amount of possible
concurrent requests."

I would disagree in the case of green threads. The structures the
programmer would have to create in order to multiplex several message
sources are exactly what the threading subsystem implements. My thread
structure currently takes 26 words, i.e. 104 bytes on 32-bit machines,
plus the stack, which in case of newly created threads starts with 256
bytes and grows as needed. My GC in the worst case doubles the memory
usage of plain objects (two generations, both copying), but stacks are
managed differently and are not doubled (except when they overflow).
This means about 0.5 kB per thread if it doesn't use much of stack
during its life (a stack never shrinks). So with 500 MB of memory,
assuming that half of memory will be used for other purposes (e.g. the
data threads operate on, and other processes in the system), I could
create about 500 thousands of threads.

Well, I can't imagine what they could do. The system will not let open
that many files, and even if it did, the poll()/select() loop will
probably kill performance. My scheduler will need time and temporary
space proportional to the number of threads which are running,
sleeping on timeout or sleeping on I/O, unless the next thread in this
ring is running, in which case it takes O(1). It must save 13 global
variables in the old thread, reload them from the new thread, and
perform some administrative computations to see what actually needs
to be done and to which threads.

I'm mixing present and future tense because I started implementing
threads a week ago and they are already partially working. What works:
threads as such, their garbage collection, timer ticks (using
setitimer internally), JoinThread, mutexes, Yield, calbacks from C
mixed with threads. What remains to be done: conditions, I/O (blocking
emulated with non-blocking, with poll()/select() in the scheduler),
Sleep, cancellation, thread-local variables, convenient high-level
threading constructs like message queues, signals, TryLock, maybe
thread priorities (but I think I will leave them into indefinite
future, I don't know how they should work anyway).

Windows would probably need WaitForMultipleObjects or something like
this instead of poll()/select(). I'm leaving this for future because I
don't know WinAPI. I haven't tried to compile my compiler on Windows
yet.

poll()/select() is not sufficient for waiting for a child process to
terminate. If one thread is waiting for a process and another thread
is waiting for I/O, OCaml performs a semi-busy wait (a poll()/select()
with a short timeout, interleaved with non-blockign waitpid()). I will
have to do the same.

Callbacks from C, which are fortunately very rare, present an
interesting challenge. With green threads it's possible that callbacks
made in separate threads try to return in a different order than the
reverse of their starts, which is impossible to implement because
their stack frames are on the same system stack and in the wrong
order. My implementation temporarily blocks threads whose callbacks
want to return out of order, until the appropriate returning thread is
found. For some programs it just works. For others it might cause a
deadlock if the right thread waits for a condition signalled by a
thread which wants to return. Sorry, there is no other choice with
these assumptions.

I will answer to the rest of your message when I come back in the
evening.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Giancarlo Niccolai
2004-10-07 11:26:34 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
You say: "Service parallelism: when the target application has to
fulfill some service duty that may be invoked by several sources at
the same time, using parallelism improves performances. We'll see in
the next articles that the correct model when implementing service
parallelism is not that of creating an agent for every request that
must be evaded. This solution (one task, one agent) would seem
natural, but in reality this would be a sure way to destabilize the
application and reduce the overall system performance, unless it can
be demonstrated that there is a maximum (low) amount of possible
concurrent requests."
I would disagree in the case of green threads. The structures the
programmer would have to create in order to multiplex several message
sources are exactly what the threading subsystem implements. My thread
structure currently takes 26 words, i.e. 104 bytes on 32-bit machines,
plus the stack, which in case of newly created threads starts with 256
bytes and grows as needed.
In fact the article talks about system/OS bound threads. What you are doing
is not strictly speaking threading, as you can't have any direct OS/CPU
support to switch your execution context, but coroutining. AFAIK green
threads are something in between coroutining and system supported threads.
Anyhow, the limit applies the same. Unless you can prove (mathematically, I
mean) that the amount of incoming requests to be evaded concurrently is
limited to a treshold that will keep your model running smooth (i.e.
10,000? 100,000?) you should not use a 1:1 thread-service model.
Post by Marcin 'Qrczak' Kowalczyk
My GC in the worst case doubles the memory
usage of plain objects (two generations, both copying), but stacks are
managed differently and are not doubled (except when they overflow).
This means about 0.5 kB per thread if it doesn't use much of stack
during its life (a stack never shrinks). So with 500 MB of memory,
assuming that half of memory will be used for other purposes (e.g. the
data threads operate on, and other processes in the system), I could
create about 500 thousands of threads.
Oh yes. And what about managing them? I.e. the 500,000th thread when would
receive its own timeslice to progress? What if you have blocking calls
that, being at OS level, cannot be controlled correcty in coroutining? All
the other 499,999 threads are going to be blocked? How much time does it
take to perform a full "emulated" context switch? and so on.
Post by Marcin 'Qrczak' Kowalczyk
Well, I can't imagine what they could do. The system will not let open
that many files, and even if it did, the poll()/select() loop will
probably kill performance.
Any general purpose VM will kill performances, or better, promptness much
sooner than poll()/select().

You may also have fast context switching between green threads, but you'll
still have 499,000 * OPCODE_PER_SWITCH vm loops time before the first
thread can be served again. Supposing a loop takes 1e-6 seconds (i.e. much
better than LUA which is currently the fastest VM of this kind, and is
about 3,1e-6 seconds per mean loop) and supposing you do 1000 opcodes
before context switch you have 1e5 * 1e3 *1e-6 = 1e2 or 100 seconds before
the first thread is served again and can run for 1e-3 seconds. This not
counting function calls that takes longer than a VM loop, of course.

Giancarlo.
Marcin 'Qrczak' Kowalczyk
2004-10-07 15:21:58 UTC
Permalink
Post by Giancarlo Niccolai
In fact the article talks about system/OS bound threads. What you are doing
is not strictly speaking threading, as you can't have any direct OS/CPU
support to switch your execution context, but coroutining. AFAIK green
threads are something in between coroutining and system supported threads.
I thought that green threads are scheduled inside the process, while
coroutines and not preempted at all - they yield control to other
coroutines only explicitly or when they would block. With this
terminology they are green threads because they are preempted by the
scheduler.

If they are not, what is the difference between coroutines and green
threads?
Post by Giancarlo Niccolai
Unless you can prove (mathematically, I mean) that the amount of
incoming requests to be evaded concurrently is limited to a treshold
that will keep your model running smooth (i.e. 10,000? 100,000?) you
should not use a 1:1 thread-service model.
If there are too many requests, nothing will help.

What would be the advantage of multiplexing several requests in the
same thread manually? I would be doing exactly the same as now the
threading subsystem does under the cover!

Of course the implementation might be tuned better. I think Linux has
some alternative interface to poll() which doesn't require O(N) time
each time if the set of descriptors doesn't change much from call to
call, or O(N) searching for the descriptors which have changed.
I don't remember the details.
Post by Giancarlo Niccolai
Any general purpose VM will kill performances, or better, promptness
much sooner than poll()/select().
I'm not sure what do you call a VM. While my implementation is much
slower than a hand-tuned C code could be written (principles of the
language put correctness and convenience before efficiency, so e.g.
integer arithmetic doesn't overflow, arithmetic is dispatched on types
at runtime, and array access doesn't cause undefined behavior on
program errors), it compiles to C which is compiled to native code
(and later mangled for tail calls) - there is no interpreted bytecode.

Yes, I exaggerated in the estimation of the number of threads; I can't
guess how well it will perform. But in principle, without changing
other properties of the language (e.g. better optimization at the cost
of explicit type declarations and/or safety), changing the model to a
large number of services per thread would not help.

http://www.sics.se/~joe/apachevsyaws.html
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
David Hopwood
2004-10-07 18:18:13 UTC
Permalink
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
I would disagree in the case of green threads. The structures the
programmer would have to create in order to multiplex several message
sources are exactly what the threading subsystem implements. My thread
structure currently takes 26 words, i.e. 104 bytes on 32-bit machines,
plus the stack, which in case of newly created threads starts with 256
bytes and grows as needed.
These figures seem reasonable to me.
Post by Giancarlo Niccolai
In fact the article talks about system/OS bound threads. What you are doing
is not strictly speaking threading, as you can't have any direct OS/CPU
support to switch your execution context, but coroutining.
It's threading, not coroutining, from the point of view of a Kogut program.
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Well, I can't imagine what [500,000 threads] could do. The system will not
let open that many files, and even if it did, the poll()/select() loop will
probably kill performance.
Any general purpose VM will kill performances, or better, promptness much
sooner than poll()/select().
You may also have fast context switching between green threads, but you'll
still have 499,000 * OPCODE_PER_SWITCH vm loops time before the first
thread can be served again.
No, you're assuming pure round-robin scheduling, which certainly would kill
performance. A reasonable implementation of user-level threading will spend
no time at all on threads that are waiting on I/O.
--
David Hopwood <***@blueyonder.co.uk>
Marcin 'Qrczak' Kowalczyk
2004-10-07 19:25:11 UTC
Permalink
Post by David Hopwood
No, you're assuming pure round-robin scheduling, which certainly
would kill performance. A reasonable implementation of user-level
threading will spend no time at all on threads that are waiting on
I/O.
Well, my current implementation is almost round-robin :-)

There is a ring of threads which are in one of these states:
- running
- sleeping with a timeout
- blocked on I/O

When the current thread blocks or its time slice is over, we need to
reschedule. Look at the next thread in the ring. If it is running,
switch to it. If not, use poll() on data based on the whole ring
and after poll() finishes, mark appropriate threads as running and
switch to the next available running thread.

Sleeping with a timeout could be handled better: they could be sorted
wrt. their time and we would look at the set with shortest times. I'm
not sure whether I will make this optimization because I would guess
it's rare when many threads sleep on a timeout.

There could also be a similar but separate ring which excludes running
threads, so constructing the poll() array would not need to skip over
them, but again I'm not sure if it's worth the effort. It would improve
the case when lots of threads are running and few of them are blocked
on I/O. I would guess (but it's a pure guess) that when the process has
lots of threads, then most of them are blocked on I/O.

Threads blocked on mutexes, conditions, joining and callbacks are not
in the ring.

I don't know how to handle I/O better: a context switch always occurs
after some real work has been done since last context switch, so we
can't assume the amount of time passed was too little to check for I/O
availability.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
David Hopwood
2004-10-08 02:05:09 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by David Hopwood
No, you're assuming pure round-robin scheduling, which certainly
would kill performance. A reasonable implementation of user-level
threading will spend no time at all on threads that are waiting on
I/O.
Well, my current implementation is almost round-robin :-)
- running
- sleeping with a timeout
- blocked on I/O
When the current thread blocks or its time slice is over, we need to
reschedule. Look at the next thread in the ring. If it is running,
switch to it. If not, use poll() on data based on the whole ring
and after poll() finishes, mark appropriate threads as running and
switch to the next available running thread.
Unfortunately, poll/select, and also Windows OVERLAPPED I/O using
WaitForMultipleObjects, scale badly for large numbers of FDs.

The scheduling should therefore be designed in such a way that it
is possible to use I/O completion ports, for example kqueue
<http://people.freebsd.org/~jlemon/papers/kqueue.pdf>
<http://www.fsl.cs.sunysb.edu/~kolya/projects/kqueue_for_linux.pdf>
or CreateIoCompletionPort in Windows.

The following approach allows completion ports to be used effectively,
and achieves the goal of spending no time on threads that are waiting
for I/O:

- keep only threads that are actually running in the thread ring.
- also keep a queue of threads that are waiting for timers, and a queue
of threads that have *completed* I/O. (If necessary, use a separate
kernel thread to receive completions.)
- after each timeslice of a thread taken from the running ring, move
the first N threads in the completed I/O queue into the running ring,
where N is a configurable parameter. Similarly for the timer queue.

(Inserting *all* of the threads in the completed I/O queue into the
running ring would risk starving compute-bound threads relative to
I/O-bound threads. This way guarantees that threads which do no I/O
will make progress.)

This approach also makes it easier to efficiently multiplex I/O across
kernel threads to make use of multiple processors, if you want to do
that in future.
Post by Marcin 'Qrczak' Kowalczyk
Sleeping with a timeout could be handled better: they could be sorted
wrt. their time and we would look at the set with shortest times. I'm
not sure whether I will make this optimization because I would guess
it's rare when many threads sleep on a timeout.
I wouldn't have thought the implementation of that was much more
complicated than an unoptimized one.
Post by Marcin 'Qrczak' Kowalczyk
Threads blocked on mutexes, conditions, joining and callbacks are not
in the ring.
That's good.
Post by Marcin 'Qrczak' Kowalczyk
I don't know how to handle I/O better: a context switch always occurs
after some real work has been done since last context switch, so we
can't assume the amount of time passed was too little to check for I/O
availability.
I/O completion ports solve this problem. Since these APIs are little-known
and somewhat tricky to use, it's much better to handle I/O multiplexing in
a VM than to force every application to do it.
--
David Hopwood <***@blueyonder.co.uk>
Jonathan Adams
2004-10-08 04:06:14 UTC
Permalink
Post by David Hopwood
The scheduling should therefore be designed in such a way that it
is possible to use I/O completion ports, for example kqueue
<http://people.freebsd.org/~jlemon/papers/kqueue.pdf>
<http://www.fsl.cs.sunysb.edu/~kolya/projects/kqueue_for_linux.pdf>
or CreateIoCompletionPort in Windows.
Solaris 10 will also have something along these lines:

http://blogs.sun.com/roller/page/barts/20040720#entry_2_event_ports

- jonathan
Marcin 'Qrczak' Kowalczyk
2004-10-08 05:47:40 UTC
Permalink
Post by David Hopwood
The scheduling should therefore be designed in such a way that it
is possible to use I/O completion ports, for example kqueue
<http://people.freebsd.org/~jlemon/papers/kqueue.pdf>
<http://www.fsl.cs.sunysb.edu/~kolya/projects/kqueue_for_linux.pdf>
or CreateIoCompletionPort in Windows.
Uhh, experimental prototype on Linux. Support for this in my compiler
will unfortunately have to wait for indefinite future, since I would
not even be able to test it.
Post by David Hopwood
(Inserting *all* of the threads in the completed I/O queue into the
running ring would risk starving compute-bound threads relative to
I/O-bound threads.
Wouldn't inserting them before the current thread guarantee that each
thread makes progress? I would guess that it's formally correct but
gives poor response times, right?

Generally I insert threads into the running ring (newly created
threads and threads unblocked from mutexes/conditions) before the
thread which is current at that time. Is that right? This guarantees
that no thread is starved, but might not be quite fair: threads which
do a lot of synchronization are penalized wrt. pure CPU bound threads.
Post by David Hopwood
I wouldn't have thought the implementation of that was much more
complicated than an unoptimized one.
You are right. I wrote that before implementing it, witout thinking.

Yesterday I did conditions and I/O, but I will improve the design a
bit, while still using only poll() for now:

- a ring of threads which are either running or waiting for I/O, to
select which thread has a chance to run next (if the next thread is
an I/O thread, then all I/O threads have a chance)
- a list of threads waiting for I/O, used to construct the poll array
- a list of threads waiting for timeout, sorted by their completion time

Are there still Unix flavors nowadays which have select() and no poll()?
Post by David Hopwood
I/O completion ports solve this problem. Since these APIs are little-known
and somewhat tricky to use, it's much better to handle I/O multiplexing in
a VM than to force every application to do it.
Is there something which runs on stock Linux kernels?

* * *

A separate problem is that the non-blocking flag for I/O is common to
all processes which inherited a particular file descriptor. Turning
it on is risky, as not all processes might be prepared to handle it.

For example the setting for stdin/stdout/stderr should be reset when
the program terminates, and I'm not sure about its state on SIGSTOP /
SIGCONT.

If the program forks and executes another program while substituting
some pipes it created for stdin/stdout/stderr, should I automatically
switch on the blocking mode back on these descriptors? The problem
is that if it doesn't substitute any pipes, just uses its own
stdin/stdout/stderr, then they would stay in blocking mode in the
main program since that time. Is there any better way than leaving
it to the user to switch the blocking/non-blocking mode, with the
default being non-blocking (we can't be sure which descriptors are
also sometimes used by internal threads)?

An alternative would be to keep the descriptors in blocking mode and
always do a poll() before read() or write(), but maybe it's better to
live with the consequences of permanent non-blocking mode? I see that
GHC turns the non-blocking mode while ruby does select() before each
operation.

* * *

Yet another problem. On Linux opening a FIFO in a non-blocking mode
for reading succeeds, and causes it to report EOF on first read()
if there is no writer yet, unless we do a poll() at the beginning.
And opening it for writing always fails for ENXIO. I "solved" this
by opening it in the blocking mode, and switching to non-blocking
afterwards, which means that opening a FIFO will block all threads.
This seems an unfortunate limitation of the OS. Is it possible to
open a FIFO for writing without blocking?
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
David Hopwood
2004-10-09 02:09:55 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by David Hopwood
The scheduling should therefore be designed in such a way that it
is possible to use I/O completion ports, for example kqueue
<http://people.freebsd.org/~jlemon/papers/kqueue.pdf>
<http://www.fsl.cs.sunysb.edu/~kolya/projects/kqueue_for_linux.pdf>
or CreateIoCompletionPort in Windows.
Uhh, experimental prototype on Linux. Support for this in my compiler
will unfortunately have to wait for indefinite future, since I would
not even be able to test it.
True, but what I'm suggesting is to look at the interface of kqueue,
Windows I/O completion ports, Solaris 10 event ports, etc., and design
something that would scale given the availability of an equivalent
interface. It's pretty easy to emulate this functionality (less
efficiently, of course) in terms of poll/select, ... *google*
... and in fact there are several implementations of that already:
<http://www.kegel.com/c10k.html#frameworks>.
libevent looks as though it would be suitable.
Post by Marcin 'Qrczak' Kowalczyk
Post by David Hopwood
(Inserting *all* of the threads in the completed I/O queue into the
running ring would risk starving compute-bound threads relative to
I/O-bound threads.
I didn't quite describe this correctly. I'll answer your questions about
it in a separate post.
Post by Marcin 'Qrczak' Kowalczyk
Yesterday I did conditions and I/O, but I will improve the design a
- a ring of threads which are either running or waiting for I/O, to
select which thread has a chance to run next (if the next thread is
an I/O thread, then all I/O threads have a chance)
- a list of threads waiting for I/O, used to construct the poll array
- a list of threads waiting for timeout, sorted by their completion time
Are there still Unix flavors nowadays which have select() and no poll()?
Post by David Hopwood
I/O completion ports solve this problem. Since these APIs are little-known
and somewhat tricky to use, it's much better to handle I/O multiplexing in
a VM than to force every application to do it.
Is there something which runs on stock Linux kernels?
libevent works on Linux, and also on Unices without poll() if there are
any left.
Post by Marcin 'Qrczak' Kowalczyk
* * *
A separate problem is that the non-blocking flag for I/O is common to
all processes which inherited a particular file descriptor. Turning
it on is risky, as not all processes might be prepared to handle it.
For example the setting for stdin/stdout/stderr should be reset when
the program terminates, and I'm not sure about its state on SIGSTOP /
SIGCONT.
fork is always problematic in a VM implementation. The fact that both
fork and exec are needed to create a new process is IMHO a significant
design wart of original Unix; Windows CreateProcess or posix_spawn
<http://www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html>
provide the functionality that is more often needed.
Post by Marcin 'Qrczak' Kowalczyk
If the program forks and executes another program while substituting
some pipes it created for stdin/stdout/stderr, should I automatically
switch on the blocking mode back on these descriptors?
Yes, I think.
--
David Hopwood <***@blueyonder.co.uk>
David Butenhof
2004-10-11 16:37:34 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
Post by Giancarlo Niccolai
Post by Marcin 'Qrczak' Kowalczyk
Condition mutex // make a condition
Wait condition
Wait condition, mutex // you need a mutex to do it clean.
The mutex is implicit in the condition, so I don't see a point in
forcing the user to redundantly specify it. There is always only one
value of the argument which makes sense.
An exception will be thrown when the mutex is not held by this thread,
so mistakes in using a wrong mutex should be detected (except when the
other mutex is locked too, but I don't believe it's so common mistake
to warrant another function argument).
Given that the Condition variable is apparently initialized with
reference to a specific mutex (implied by the "Condition mutex"), and
with pervasive GC, I would tend to agree.

In the POSIX interface, "Wait" (pthread_cond_[timed]wait) take both
mutex and condition because there's no fixed creation-time relationship.
Condition variable and mutex are separate objects, and the dependency is
dynamic rather than static. The rule is that no application can have
threads waiting CONCURRENTLY on the SAME condition variable using
DIFFERENT mutexes.

That model supports application-level caching and "repurposing" of
condition variables and mutexes. (Typical Java implementations built on
top of POSIX often do this, for example, to implement object
'synchronized' mutexes and wait/notify methods.) In principle, both
mutex and condition might represent kernel objects that are relatively
expensive to free and reacquire. That's much less of an issue with GC;
but of course POSIX is a C language interface which doesn't support GC.
--
Dave Butenhof, ***@hp.com
iCOD/PPU, thread consultant
Manageability Solutions Lab (MSL), Hewlett-Packard Company
110 Spit Brook Road, ZK2/3-Q18, Nashua, NH 03062
Marcin 'Qrczak' Kowalczyk
2004-10-10 15:32:27 UTC
Permalink
Post by Giancarlo Niccolai
Finally, how much is advanced your language?
Things implemented or partially implemented:

* Core language features, e.g.
- garbage collection
- functions with proper tail calls, including local and anonymous
functions
- generic functions (dispatched on types of arguments)
- singleton and record types
- types with programmed behavior of instances and a funny kind of
object inheritance
- pattern matching
- exceptions
- lazy variables
- separately compiled modules similar to Haskell
- separately linked libraries (only static linking at the moment)
- threads (incomplete).
* Numbers: arbitrary precision integers, ratios, floats, and complex
numbers.
* Collections: a generic interface, lists, lazy lists, Unicode strings
(immutable), vectors (with mutable contents), arrays (with mutable
contents and resizable on both ends in amortized O(1) time), hash
dictionaries, hash sets.
* I/O streams: raw OS files, buffering, transparent character recoding,
special stream types like concatenation or making a copy of everything
which is written. Arrays support the stream interface, and conversely
buffered streams support a subset of collection interface (unlimited
lookahead and putback).
* Embedding C snippets in code. This heavily depends on internals of
the implementation and is considered specific to this implementation,
not an integral part of the language. I don't know yet how a more
convenient way of accessing C libraries should look like.
* Some separately linked libraries:
- Unix API (incomplete, for now: directories, processes and execution
of programs, environment variables, stat() with friends, pipes,
I/O redirection, sending signals).
- Bindings to gzip and bzip2, usable as stream filters.
- Binding to curses, uses wide character API when available (works on
UTF-8 terminals).
- Bindings to Python and Perl (the latter is incomplete) which allow
to use their libraries.
* An example program: Scheme interpreter (R5RS without macros).

The compiler is self-hosting. It generates semi-portable C code
and thus can be bootstrapped on new architectures without problems.

Things not done at all, planned for future:

* Weak references.
* Native finalizers (currently only C finalizers which don't call back
are supported; native finalizers require threads).
* Networking, including implementing protocols like http and smtp.
* Compilation to some form which can be interpreted.
* Macros (a bit similar to syntax-case in Scheme).
* Dynamic linking of compiled code.
* Interactive read-eval-print loop.
* Serialization.
* Windows port (mingw, cygwin, VC++ - I don't know which of these).
* Bindings to more libraries, e.g.:
- Interfacing with other languages and environments (.NET, OCaml,
Haskell, Scheme, Java).
- Native bindings to Gtk+ (currently it works via Python and Perl).
- Bindings to GNOME libraries.
- Database engines.
- Image manipulation.
- OpenGL.
- More complete Unix API (e.g. memory mapped files, shared memory,
terminals, user database).
Post by Giancarlo Niccolai
Incidentally, I am developing Haste at
http://www.haste.ws
And, of course, one of its strength has to be MT. I am searching for
cooperation, and as you have the same interest in making languages
we may find an agreement ;-).
Sorry, it would be hard to agree on too many things because I already
have a vision of a perfect language :-) And I already have a project
to which I devote all free time.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
David Hopwood
2004-10-05 02:06:33 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
I'm still open to not basing the primitives on mutexes and conditions
at all, I would be happy if something more elegant yet expressive
could be used instead, but I don't know any "complete" declarative
concurrency model.
Declarative models are not complete pretty much by definition: the whole
point is that they trade expressiveness for determinism and ease of
reasoning.

A language should be able to support declarative concurrency as a subset,
but shouldn't be restricted to it.
--
David Hopwood <***@blueyonder.co.uk>
Jesse Jones
2004-10-05 19:17:59 UTC
Permalink
Post by Marcin 'Qrczak' Kowalczyk
I'm still open to not basing the primitives on mutexes and conditions
at all, I would be happy if something more elegant yet expressive
could be used instead, but I don't know any "complete" declarative
concurrency model.
In my opinion the fundamental problem is that concurrent execution
doesn't fit well into a sequential model of computation. If you try it
with an imperative language you wind up with a difficult and error
prone model requiring synchronization before accessing any shared
mutable structure. With pure functional languages you don't need to do
this, but there are some pretty severe limits on exactly how much you
can do because you can't do anything non-deterministic.

That's why I'm interested in the pi calculus: it's a formally sound
model designed to express parallel computation. It seems to be a much
better basis for building threading abstractions on top of then trying
to retrofit them onto a sequential language.

-- Jesse
Loading...