Post by skallerPost by Brian HurtPost by skallerBut it is irrelevant, because programmers do not intend to write
arbitrary code. They intend to write comprehensible code, whose
correctness is therefore necessarily easy to prove.
What they intend to do and what they do do are often completely different.
It is a rare programmer who *intentionally* adds bugs to his code. Yet
bugs still happen.
Of course. But my point is that a lot of the arguments
I have seen that 'we cannot do that because the general
problem is NP-complete' are irrelevant.
The problem isn't NP-complete, it's unsolvable in the general case.
The fact that it's insolvable means that the language can not issue
blanket gaurentees, just "best effort" requirements. Which work fine
right up until the point that the best effort fails. An example of this
problem in action, consider tail recursion optimization. But even here it
fails- for most inputs, functions that aren't properly tail recursive
still work correctly. And when it doesn't work correctly, it fails in an
obvious (and easy to debug) way- a stack overflow.
With precise destruction of objects, as you yourself point out, encourages
the programmer to depend upon it. When the program hits some point of
complexity where best effort fails (and since the problem is unsolvable in
the general case, there will be a point where complexity pass this
threshold), suddenly the program is struck by subtle, hard to debug bugs
(lines being written to files in the wrong order, etc).
Post by skallerPost by Brian HurtBut my point here is that determining the *exact* time that an object
becomes garbage can be arbitrarily complex. Most of the time, I agree, it
won't be.
The point is though: when it matters, the onus is on
the programmer to make sure the design is able to
determine an 'exact enough' time, using whatever
tools the language may provide.
What may be obvious to the programmer may not be obvious to the compiler.
To me, the code:
let rec append src dst =
match src with
| [] -> dst
| h :: t -> h :: (append t dst)
;;
should be tail recursion optimizable. The ocaml compiler, however,
doesn't.
If the programmer can specify exactly when a resource should be released,
then there should be a function he can call at that point to release the
resource. At which point the destructor doesn't need to do anything.
Post by skallerAn obvious candidate is a pure stack protocol
(automatic store) and another is a hierarchical
system (ref count).
A stack based store would be tricky with a functional language, but
*might* be doable. It's completely unworkable with an object oriented
language. The whole idea of OO programming is that the object interface
hides the internal details of the object, including it's size.
Reference counting has an unacceptable performance penalty compared to
mark and sweep, let alone more advanced garbage collection algorithms.
Plus it has problems with circular data structures (which is why most
reference counting implementations have a backup mark and sweep). Which
is why reference counting went out of vogue in the 1970's.
Note that if all else fails, you *can* implement reference counting GC on
top of Ocaml (you probably need to do some Obj.magic hacks).
Post by skallerWhen there are circularities .. due to the complexity
it would be unwise to count on synchronous finalisation,
since it isn't clear when that is (by definition it
is rather hard to determine ..)
My point *exactly*.
Post by skallerPost by Brian HurtOnce you allow for the fact that finalizers/destructors may not happen in
a defined order or at defined times, why not go whole-hog?
Hard question. One job a language is often required to do
is emulate the behaviour of a program written in another
language .. that's not always easy to do when the object,
type, allocation or execution models are different.
Which is why languages should be carefull to constrain their
implementations as little as possible. Is the problem Ocaml's, for not
having the type of GC required by Interscript, or Interscript's, for to
tightly defining what type of GC they have? Or programmers, for
programming by coincidence ("It worked on my machine...")?
This also allows the language to improve it's implementations- for example
replacing a reference counting GC with a mark and sweep GC.
Post by skallerPost by Brian HurtIn the most intractible cases, neither the compiler nor the programmer may
be able to determine when the last reference is released. But even in the
simple cases, the programmer may have the intent to release the resources
and simply doesn't. It's called a bug.
in Java, variables must be initialised. Unlike Ocaml though,
they do not have to be initialised at the point of
its a compromise between initialisation at the point
of declaration like Ocaml, or arbitrary initialisation
(onus on programmer) as in C++.
The language picked a constraint that is easy to check
for both the compiler and for humans which is more
flexible than the Ocaml model, but still assures
freedom from unitialised variable errors. It still
can't handle complex cases, which can be done in C++,
but they're rarely needed because if they seem
to be needed there is a good chance the design
is flawed.
So long as I can declare variables when I first need them, I've never had
a problem with initialization as part of allocation. Using an
uninitialized variable is *always* wrong. I will occassionaly not
initialize variables in C, because I have to declare all my variables at
the begining of the function, and I don't know the correct initial value
at that point. In Ocaml, and in Java, I simply move the variable
declaration up to where I do know it's initial value.
Note that Java is even more imprecise about when objects get freed than
Ocaml is. In Ocaml, the GC runs inside the same thread as the main
program. In Java, it's specified that the GC runs in a seperate thread.
Meaning that destructors execute asynchronous from the main program. If
the destructor for object X accesses object Y which is not garbage, and
the main thread is also accessing object Y, you have a possible race
condition. You can not write non-multithreaded programs in Java.
Were I writting the spec for Ocaml, I'd not only allow for the possibility
of GC running in it's own thread, I'd also allow for the possibility of
multiple threads of GC running in parallel.
Post by skallerNormally I wouldn't. But here there was no choice.
You have to understand first that Interscript is a library
read a line
if at end of file, delete user space else
otherwise if we're in tangle mode
write to current source file
otheriwse we have to be in document mode
write to current document
repeat
@h = tangler('src/x.ml')
@select(h)
print_endline "I'm an ocaml program"
<EOF>
Do you see the problem? The open file lives
in USER space. The language syntax does not
require the user close open files, because
that would leave a dangling reference.
But the engine has no idea which things
are files that need closing, and which
are documents that need tables of contents.
The ONLY way to finalise
each object correctly is in the destructor
method (__del__ method). I could add a
'finalise' method to each object and call it
but that would not help -- the __del__
method is precisely that anyhow,
and it gets invoked automatically so it
relieves me of the task of finding
all the objects.
You can see that the problem arises
from a design decision not to require
the USER to close files.
And from the users doing programming by coincidence. Looks like you're
going to have to implement reference counting.
Post by skallerPost by Brian HurtIf it matters when the buffers are flushed, manually flush the buffers.
See above. It isn't always possible: the program doesn't
always know which files are open. This is quite typical
in object oriented programs -- the program doesn't know
what kinds of objects exist, there is no master
that can call the 'finalise' method.
In C++ code, particularly GUI code, callbacks often
lead to suicide 'delete this' simply because there
is no one around to delete an object.
Yep. IMHO, OO pretty much demands GC. But if the program doesn't know
what is going on, and when it's releasing the last reference to an object,
how is the compiler supposed to know? Welcome to the general case.
Post by skallerPost by Brian HurtPost by skallerOn thing is for sure .. I *hate* seeing
"program terminated with uncaught "Not_found" exception"
because I do hundreds of Hashtbl.find calls....
There is a religous debate between returning 'a and throwing an exception
if the item isn't found, or returning 'a option.
I don't think it is religious. There is a genuine problem here.
I do not know how to state it exactly, but I'll try.
Checking error codes and poping the stack is not only
tedious and obscures the normal control flow,
it does so to such an extent in certain cases
as to be totally and unequivocably out of the question.
For example in math calculations you simply cannot
afford to check every single function call either
x = (a + b/c ^ d)
for example would turn into many lines of spagetti.
"the error detection is too localised"
for this kind of code. On the other hand the
uncaught exception of dynamic EH clearly
shows that that mechanism leads to
"the error detection is too global"
What alternatives are there?
One is to have exception specifications on functions,
but that is known not to work very well. The first
difficulty is that once you have them, they must
become part of the type system. They then 'snowball'
through the whole program (in the same way 'const'
does to C programs).
Type inference is your friend here, as it relieves the programmer of a lot
of burden of handling more complex types. But the fundamental problem is
that we want the programmer to think about and handle error cases, and
many programmers doesn't want to as that's extra work.
Post by skallerIt isn't possible to deduce what exceptions can
be thrown when functions are passed as arguments
to functions, so this pollution of the type system
would also be manifest in code in the form
of annotations .. basically higher order functions
would be screwed completely by this.
No. If a function is defined to take an argument of type "unit -> int",
and you try to pass it a function of type "unit -> int throws Not_found"
(or whatever the syntax is), this is a type error. The other direction
should be ok, however.
Some generality would be needed. You'd want to be able to express a
function type like:
val foo: (unit -> int throws 'a) -> int throws 'a
Post by skallerSo exception specifications are out for
'engineering' reasons.
Another alternative is static exception handling.
I tried that in Felix. It is very good for a
wide class of problems. Static EH implies
block structured code with the handler visible
from the throw point... its really a structured goto.
I'd have to take a look at this.
Post by skallerThis solves many cases where we really wanted
'alternate control flow constructs' rather
than error handling, but not all. And it doesn't
work where a more dynamic transmission of
error notifications is required.
What else? Continuations? Can monads be used?
We really do need a mechanism with better
locality (easily control scope).
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
-------------------
To unsubscribe, mail caml-list-***@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners