Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykCondition mutex // make a condition
Wait condition
Wait condition, mutex // you need a mutex to do it clean.
The mutex is implicit in the condition, so I don't see a point in
forcing the user to redundantly specify it. There is always only one
value of the argument which makes sense.
An exception will be thrown when the mutex is not held by this thread,
so mistakes in using a wrong mutex should be detected (except when the
other mutex is locked too, but I don't believe it's so common mistake
to warrant another function argument).
Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykYield() // give some CPU to other threads
A bad idea; It would be a good idea if you had coroutining (that is,
totally emulated threading via VM abstraction). Are you using system
API or are you "emulating" threading via VM opcount slices? In this
second case, then Yield is good.
I am "emulating" threading (I think I've told this). I have to do this
because:
1. Multithreaded garbage collection is hard.
2. The runtime is using global variables for performance-critical tasks
(e.g. argument passing), making them thread-local would be bad.
And it's easy in my case, because I internally use continuation-passing
style and custom stack to have proper tail calls and precise GC. The
timer asks for a context switch by faking a stack overflow condition.
The code, which checks for stack overflow anyway, detects that it was
a request for context switch and proceeds accordingly.
I'm here breaking strict ISO/ANSI C rules: the signal handler modifies
global volatile variables of pointer type, not just sig_atomic_t.
I hope it works on all interesting architectures. It would be too
painful to do otherwise because it would require an explicit check.
The only cost is that I had to disable the optimization of allocating
the stack frame as late as possible. The optimization allowed it to
be avoided altogether in some branches of control flow (it is often
possible to do without a stack frame if we aren't calling other
functions in non-tail-call positions, because temporary data can be
held in virtual registers emulated in global variables, or even in
plain local C variables if GC can't happen during working with them).
Unfortunately not all places where it got allocated were safe for GC,
so they were not safe for a context switch either. And it would
inconvenient to restart in the middle of a C function, it would have
to be split at this point (it's split anyway in lots of places, but
not necessarily when the stack frame is allocated). With my virtual
register allocation algorithm it was too hard to ensure that the stack
frame is allocated in safe places only, so now I allocate it at the
beginning of each function which needs it at all. This caused a
slowdown by 3% (measuring the compiler itself).
My runtime is similar to a VM in that it uses virtual registers,
custom stack, and custom calling convention, but the code is compiled
into C, there are no opcodes for VM instructions.
I know advantages of native threads: that blocking I/O is executed
in parallel without having to emulate it using non-blocking I/O, that
multiple processors are utilized by a single process, and that context
switches don't need to be delayed until a safe point (which will
generally happen very soon, but it may be long in case of long
functions implemented in C, for examble bignum arithmetic is atomic
wrt. context switches). Anyway, I have little choice. I consider both
approaches viable, with different strengths (I would guess my threads
are more lightweight than OS threads).
Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykI'm not sure what should happen with unhandled exceptions in
threads. Note that an unhandled exception in the main thread runs a
global handler which by default dumps a stack trace on stderr. But
what about exceptions in threads? If they are just recorded for
whoever does JoinThread, then they will be lost silently if nobody
does JoinThread. OTOH dumping a stack trace when the program as a
whole is not going to die seems silly. Should there be thread-local
handler, similarly for the handler for the main thread? What it
should do by default?
I would see better a VM related error handler; you want to have
another handler? then have another VM. All the threads handled in a
VM should die the same way they should not be allowed to override
the VM ability to handle them.
Sorry, I don't understand.
First, it is possible to override the global error handler, and I see
nothing wrong with that. For example it makes sense for a CGI program
to dump the message and stack trace to the HTML page it produces. This
handler is simply a function held in a global variable (variable on
the Kogut side, not C side).
With an unhandled toplevel exception the program is going to
eventually die no matter what the handler does, but the handler can
choose a better medium for communicating the message than stderr,
or format the stack trace differently.
The stack trace is available only in that handler, but not in ordinary
'try', because it's materialized only when needed. An exception which
is caught is very fast. This means that 'try' around the bulk of the
program is not a substitute for the toplevel error handler.
The function ExitProgram generates an exception which can be caught
in the normal way; the main purpose is for Finally clauses to execute.
This exception is special in one way: it does not cause materializing
the stack trace and calling the toplevel error handler. If it reaches
the toplevel, the program is just terminated.
Now we have threads. Thread cancellation is similar to exiting the
program, only on the thread level, and it differs that it may be
generated from outside. So this exception should just terminate the
thread silently, after it has been propagated to *its* toplevel.
But what about other exceptions? If they are not caught in the thread,
they usually are bugs or I/O errors which have not been checked.
If someone joins the thread, he will see the exception; if this is the
main thread and it does not catch exceptions thrown by JoinThread,
the exception will terminate the whole program with a stack trace dump
(this definitely shows a bug in the program: it *should* have been
caught at some point, either in the thread or after joining it).
But if nobody joins it, the exception will be silently discarded.
Errors should not be discarded silently! So, what to do with them?
Having a similar thread-local handler which prints them on stderr
by default (except thread cancellation) has two problems:
- It may be printed in the middle of the computation (what is worse,
StdErr is currently not thread-safe, because it's buffered on the
Kogut side using pure Kogut code, not atomic C code).
- If I choose to propagate it to JoinThread after it has been printed
on StdErr, it will be printed twice if not caught in the main thread.
This makes no sense.
Options I imagine:
1. Currently it's just recorded in the thread state and propagated by
JoinThread. Leave it this way.
2. Have a thread-local toplevel error handler, but which defaults to
rethrowing the exception, with the effect like above. Somebody can
install a handler which dumps error information to StdErr instead.
3. Like 2, but reverse the default: it's usually printed with a stack
trace. Problem: what JoinThread should return? If it's some normal
value like Null, it may confuse the computation which expects
something different. If it rethrows the exception, it will be often
reported twice.
Anything else? If I see no other way, I will leave it as it is. This
means that the error is silently discarded if nobody does JoinThread.
One can insert an explicit 'try' around the thread body and do
something else, he only can't obtain a stack trace.
Note that (when I implement thread cancellation) the exception to
cancel a thread will be propagated like others. This means that if
a thread has been cancelled, and another thread performs JoinThread
without catching exceptions, it will be cancelled as well. I guess
this is fine.
Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykAnother problem. What should the program do when the main thread has
finished but some other threads have not?
1. Wait for them to finish (or wait forever if they deadlocked).
2. Just kill them at once. The programmer should avoid this.
3. Cancel them as if with CancelThread, wait for them to finish.
4. Try to let the program go into the background by doing fork().
This has bad side effects, e.g. pid is changed, and children
processes are lost, but maybe it's the closest approximation
of what the programmer had in mind.
I would go for 2: the programmer should avoid this IF IT DOES NOT WANT THIS.
This is indeed what happens when I do nothing special about it.
I feel uneasy about abrupt termination of computation, without even
running Finally clauses, but OK - the programmer can indeed program
other behaviors by hand.
If the main thread is terminated by an uncaught exception, it makes
sense to report the error and abort, even if other threads are doing
some work.
Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykAnother issue: Unix signals. Of course I, the implementor, must handle
them specially, because too little is physically possible to be
executed from them directly. But what should be the effect? Does it
make sense to start a new thread for a signal?
No. It make sense to start a thread to handle ALL the signals. I did a
http://cvs.sourceforge.net/viewcvs.py/xharbour/xharbour/source/vm/thread.c?rev=1.177&view=auto
http://cvs.sourceforge.net/viewcvs.py/xharbour/xharbour/source/rtl/hbserv.c?rev=1.21&view=markup
The second file handles fork and signal masking.
Digging into a complex code will take me some time. I hope you don't
mind if we talk about indented effects before I dig into this
implementation.
So you are saying that it should behave analogously to C, i.e.
temporarily suspend the computation of that thread until the signal
handler returns? OK, the programmer can start a thread from the
handler himself if it wants to wait for some condition or such.
How does the programmer designate the thread? By execution of some
function ("I want to handle signals") in this thread? It defaults to
the main thread of course.
The handler will be physically started at a safe point, which in my
case means when a context switch may occur: at the beginning of any
function which allocates a stack frame. I think I can let it behave as
if it was an asynchronous function call, i.e. the handler can return
(its result is ignored and computation resumes) or throw an exception
(it is thrown into the interrupted code). So ExitProgram from the
handler will behave in the expected way.
What if the thread which handles signals is blocked on a condition
variable? If I don't do anything special, it will handle the signal
after it is unblocked, i.e. after the condition is satisfied, and
until that time it's just recorded somewhere as pending.
When the signal arrives several times before we have a chance to
handle it, should the handler be run multiple times or just once?
Or maybe I should physically mask it as soon as it arrives, and
unblock before or after the signal handler, so I don't have to worry
and the OS will choose whether to fire it again when it's unblocked?
Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykBut it wants to re-enable it without side effects.
If there aren't cancellation points up to the end of the thread,
then there aren't side effect.
Let's assume that there are.
Post by Giancarlo NiccolaiElse, you should implement "kind cancellation" by setting a flag in
the thread data, a flag that would be read by the VM and would then
terminate the thread.
I'm not talking about implementation but intended behavior. Since I
implement threads myself, the cancellation will be as kind as needed.
It will be terminated in a safe point according to cancellation rules
(during a blocking operation by default, which can be temporarily
changed to "never" or "any time").
But this doesn't help. The cancellation policy is the default in this
case, in order to interrupt blocking I/O (of course it's non-blocking
I/O under the hood, it's not the point). So it will interrupt the read,
but possibly it will interrupt something else if it fires too late.
I want to interrupt the read, but not anything later.
Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykIt is a race condition. There is a small but non-zero probability that
the timeout fires after ReadBlock completed, but before CancelThread.
This does not configure a race condition. A race condition implies
that the race must be destructive in some way: causing data loss or
deadlock.
It is destructive: it causes to terminate a thread that I don't want
to be terminated.
Post by Giancarlo NiccolaiBut this is the implied behavior of the code you wrote. It's, so to
speak, a correct behavior.
The code was deliberately wrong to show what I'm trying to archieve
and what the problem is. I know that this is implied - that's the
problem! I want to implement a timeout for an I/O operation in terms
of the threading primitives my language will provide. The thread which
wants to perform this timed I/O must not permanently change its state
to "cancellation is pending". It must either return with the answer
"the I/O occured on time, here is what was read" or "the I/O could not
be performed before the timeout".
I can distinguish the effect of the read (because it's being read into
an array which knows how much of it is filled), but I don't know how
to safely interrupt the read without the risk of interrupting some
random blocking operation that will be performed later.
Post by Giancarlo NiccolaiAs I said, do-it-yourself: provide a kind of VM driven thread
cancellation request.
I'm asking how to *use* these kind cancellation requests to implement
timed I/O.
Post by Giancarlo NiccolaiPost by Marcin 'Qrczak' KowalczykSure. But for detected deadlocks there comes a question whether
throwing an exception and allowing the code to report the bug
gracefully is good, or maybe it's not a bug but the programmer for
some silly reason intended it to deadlock and doing anything else
would be incorrect?
No, a deadlock is just an error. So if you can detect it and throw
an exception, do it.
I changed my mind here.
Erlang doesn't provide mutable data structures. But they can be
emulated with threads with local state if somebody needs them.
Similarly in my Kogut it will be possible to implement data structures
by threads which are listening to messages, even though of course it's
not needed to have plain mutable variables. When someone nevertheless
makes an "object" with an associated thread, it should behave sanely.
Now, what should happen if he makes such object, uses it, then forgets
about it? From the point of view of the user of such object it's an
ordinary garbage collectible object. He might even not know that it's
implemented using a thread internally. The thread will be left blocked
on a condition variable inside some message queue structure, and the
runtime will at some point determine that all this is garbage because
the condition variable is not reachable by any running or potentially
running thread.
So the thread should not be disturbed by being wakened back into life
and trying to respond to an exception. It's just garbage collected.
Now I'm sure that this is right.
A separate case is a total deadlock when all threads are blocked on
mutexes, conditions and joining, there are no threads running nor
blocked on I/O or timeout, so there is no chance that anything
will change. It happens that this would break the invariant of the
scheduler that the ring of running and potentially running threads is
non-empty. So I have little choice but to abort the program with an
error message written to stderr, and this is not recoverable because
there is no thread which could handle the recovery.
Post by Giancarlo NiccolaiI just suggest you to provide two kind of exceptions: stoppable and
unstoppable. The latter may be only caught by the VM, that would
pass it to a application level (not script level) handler or print a
stack trace and exit if the handler has not been set. The script
should not be allowed to catch some exceptions that are outside its
recovery ability: a script can't recover from a deadlock.
There is no application vs. script. A program in Kogut is compiled to
an ordinary executable, and it's the only thing being run.
At some time it will be possible to embed it in some other program,
but currently there is no interpreter to execute code which has just
been processed. It's being compiled into C, a C compiler is called
to compile it, and its assembler output is mangled to make function
calls faster (tail calls require a different calling convention).
Unfortunately the generated code is quite big, so the executables
are big (I don't use dynamically loaded libraries yet) and 90% of
the compilation time is used by gcc.
Post by Giancarlo Niccolaihttp://www.niccolai.ws/works/articoli/art-multithreading-en-1a.html
You say: "Service parallelism: when the target application has to
fulfill some service duty that may be invoked by several sources at
the same time, using parallelism improves performances. We'll see in
the next articles that the correct model when implementing service
parallelism is not that of creating an agent for every request that
must be evaded. This solution (one task, one agent) would seem
natural, but in reality this would be a sure way to destabilize the
application and reduce the overall system performance, unless it can
be demonstrated that there is a maximum (low) amount of possible
concurrent requests."
I would disagree in the case of green threads. The structures the
programmer would have to create in order to multiplex several message
sources are exactly what the threading subsystem implements. My thread
structure currently takes 26 words, i.e. 104 bytes on 32-bit machines,
plus the stack, which in case of newly created threads starts with 256
bytes and grows as needed. My GC in the worst case doubles the memory
usage of plain objects (two generations, both copying), but stacks are
managed differently and are not doubled (except when they overflow).
This means about 0.5 kB per thread if it doesn't use much of stack
during its life (a stack never shrinks). So with 500 MB of memory,
assuming that half of memory will be used for other purposes (e.g. the
data threads operate on, and other processes in the system), I could
create about 500 thousands of threads.
Well, I can't imagine what they could do. The system will not let open
that many files, and even if it did, the poll()/select() loop will
probably kill performance. My scheduler will need time and temporary
space proportional to the number of threads which are running,
sleeping on timeout or sleeping on I/O, unless the next thread in this
ring is running, in which case it takes O(1). It must save 13 global
variables in the old thread, reload them from the new thread, and
perform some administrative computations to see what actually needs
to be done and to which threads.
I'm mixing present and future tense because I started implementing
threads a week ago and they are already partially working. What works:
threads as such, their garbage collection, timer ticks (using
setitimer internally), JoinThread, mutexes, Yield, calbacks from C
mixed with threads. What remains to be done: conditions, I/O (blocking
emulated with non-blocking, with poll()/select() in the scheduler),
Sleep, cancellation, thread-local variables, convenient high-level
threading constructs like message queues, signals, TryLock, maybe
thread priorities (but I think I will leave them into indefinite
future, I don't know how they should work anyway).
Windows would probably need WaitForMultipleObjects or something like
this instead of poll()/select(). I'm leaving this for future because I
don't know WinAPI. I haven't tried to compile my compiler on Windows
yet.
poll()/select() is not sufficient for waiting for a child process to
terminate. If one thread is waiting for a process and another thread
is waiting for I/O, OCaml performs a semi-busy wait (a poll()/select()
with a short timeout, interleaved with non-blockign waitpid()). I will
have to do the same.
Callbacks from C, which are fortunately very rare, present an
interesting challenge. With green threads it's possible that callbacks
made in separate threads try to return in a different order than the
reverse of their starts, which is impossible to implement because
their stack frames are on the same system stack and in the wrong
order. My implementation temporarily blocks threads whose callbacks
want to return out of order, until the appropriate returning thread is
found. For some programs it just works. For others it might cause a
deadlock if the right thread waits for a condition signalled by a
thread which wants to return. Sorry, there is no other choice with
these assumptions.
I will answer to the rest of your message when I come back in the
evening.
--
__("< Marcin Kowalczyk
\__/ ***@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/