What is Expressiveness in a Computer Language

Discussion:

Joachim Durchholz

2006-06-14 18:55:40 UTC

For example,
if you have to code everything as natural numbers, untyped pure lambda
calculus or S-expressions, there is a good chance that you can get
nonsense past the compiler.

Also past peer review and/or debugging runs. And, most importantly, past
your own mental review processes.

Regards,
Jo

Marshall

2006-06-25 20:02:45 UTC

Permalink

A type system that required an annotation on all subprograms that do not
provably terminate, OTOH, would not impact expressiveness at all, and would
be very useful.

Interesting. I have always imagined doing this by allowing an
annotation on all subprograms that *do* provably terminate. If
you go the other way, you have to annotate every function that
uses general recursion (or iteration if you swing that way) and that
seems like it might be burdensome.

Not at all. Almost all subprograms provably terminate (with a fairly
easy proof), even if they use general recursion or iteration.

Well, um, hmmm. If a subprogram uses recursion, and it is not
structural recursion, then I don't know how to go about proving
it terminates. Are the proof-termination techniques I don't
know about?

If we can specify a total order on the domain or some
subcomponents of the domain, and show that recursive
calls are always invoked with arguments less than (by
the order) those of the parent call, (and the domain is
well founded) then we have a termination proof.

(If we don't use recursion at all, and we only invoke
proven-terminating functions, same thing; actually
this is a degenerate case of the above.)

For iteration? Okay, a bounded for-loop is probably easy,
but a while loop? What about a function that calculates
the next prime number larger than its argument? Do we
have to embed a proof of an infinity of primes somehow?
That seems burdensome.

If it were not the case that almost all functions provably terminate,
then the whole idea would be hopeless.

I agree!

If a subprogram F calls G, then
in order to prove that F terminates, we probably have to prove that G
terminates. Consider a program where only half of all subprograms are
annotated as provably terminating. In that case, we would be faced with
very many cases where the proof cannot be discharged, because an
annotated subprogram calls an unannotated one.

Right, and you'd have to be applying the non-terminating annotation
all over the place.

If, on the other hand, more than, say, 95% of subprograms provably
terminate, then it is much more likely that we (or the inference
mechanism) can easily discharge any particular proof involving more
than one subprogram. So provably terminating should be the default,
and other cases should be annotated.

Well, I can still imagine that the programmer doesn't care to have
non-termination examined for every part of his code. In which case,
he would still be required to add annotations even when he doesn't
care about a particular subprograms lack of a termination proof.

The pay-for-it-only-if-you-want-it approach has some benefits.
On the other hand, if it really is as common and easy as you
propose, then annotating only when no proof is available is
perhaps feasible.

I'm still a bit sceptical, though.

I do not know how well such a type system would work in practice; it may
be that typical programs would involve too many non-trivial proofs. This
is something that would have to be tried out in a research language.

Yeah.

Marshall

David Hopwood

2006-06-21 23:01:36 UTC

Permalink

Really? I can see that in a strong enough static type system, many
dynamic typing features would become unobservable and therefore would be
pragmatically excluded from any probable implementations... but I don't
see any other kind of mutual exclusion between the two.

Well, it strikes me that some of what the dynamic camp likes
is the actual *absence* of declared types, or the necessity
of having them.

So why aren't they happy with something like, say, Alice ML, which is
statically typed, but has a "dynamic" type and type inference? I mean
this as a serious question.

At the very least, requiring types vs. not requiring
types is mutually exclusive.

Right, but it's pretty well established that languages that don't
require type *declarations* can still be statically typed.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Marshall

2006-06-22 16:55:59 UTC

Permalink

In this simple example,
the static case is better, but this is not free, and the cost
of the static case is evident elsewhere, but maybe not
illuminated by this example.

Ugh, please forgive my ham-fisted use of the word "better."
Let me try again:

In this simple example, the static case is provides you with
a guarantee of type safety, but this is not free, and the
cost of the static case may be evident elsewhere, even
if not illuminated by this example.

Marshall

Pascal Costanza

2006-06-23 11:46:47 UTC

Permalink

Consider a simple expression like 'a + b': In a dynamically typed
language, all I need to have in mind is that the program will attempt to
add two numbers. In a statically typed language, I additionally need to
know that there must a guarantee that a and b will always hold numbers.

I still don't really see the difference.
I would not expect that the dynamic programmer will be
thinking that this code will have two numbers most of the
time but sometimes not, and fail. I would expect that in both
static and dynamic, the thought is that that code is adding
two numbers, with the difference being the static context
gives one a proof that this is so.

There is a third option: it may be that at the point where I am writing
this code, I simply don't bother yet whether a and b will always be
numbers. In case something other than numbers pop up, I can then make a
decision how to proceed from there.

In this simple example,
the static case is better, but this is not free, and the cost
of the static case is evident elsewhere, but maybe not
illuminated by this example.

Yes, maybe the example is not the best one. This kind of example,
however, occurs quite often when programming in an object-oriented
style, where you don't know yet what objects will and will not respond
to a message / generic function. Even in the example above, it could be
that you can give an appropriate definition for + for objects other than
numbers.

Pascal

--
3rd European Lisp Workshop
July 3 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

Marshall

2006-06-23 21:52:59 UTC

Permalink

I can't see how you'd call + on a and b if you think they might
not be numbers.

Now substitute "<" for "+" and see if you can make the same argument. :-)

If your point is about overloading, then I don't see how it affects my
point. My point was, I don't see why you'd be writing code using
operators that you thought might not be applicable to the operands.

Marshall

Darren New

2006-06-23 22:23:19 UTC

Permalink

Post by Marshall
point. My point was, I don't see why you'd be writing code using
operators that you thought might not be applicable to the operands.

I see. I thought your point was that you might write code for which you
didn't know the type of arguments. Unless you equate "type" with what
smalltalk calls "protocol" (i.e., the type is the collection of
operators applicable to values in the type), these are two different
statements.

--
Darren New / San Diego, CA, USA (PST)
Native Americans used every part
of the buffalo, including the wings.

Andreas Rossberg

2006-06-22 11:53:26 UTC

Permalink

I'm confused. Are you telling that you just write a+b in your programs
without trying to ensure that a and b are in fact numbers??

- Andreas

Andreas Rossberg

2006-06-22 15:21:31 UTC

Permalink

For example, sort doesn't need to know what type the objects it sorts
are. It only needs to be given a function that is able to compare the
objects.

Sure. That's why any decent type system supports polymorphism of this
sort. (And some of them can even infer which comparison function to pass
for individual calls, so that the programmer does not have to bother.)

- Andreas

David Hopwood

2006-06-25 21:36:10 UTC

Permalink

Huh? There is a huge, fundamental difference: namely whether a type
system is sound or not. A soundness proof is obligatory for any serious
type theory, and failure to establish it simply is a bug in the theory.

So you claim Java and Objective C are "simply bugs in the theory."

Java's type system was unsound for much of its life. I think that supports
the point that it's inadequate to simply "wish and hope" for soundness,
and that a proof should be insisted on.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Joe Marshall

2006-06-20 17:09:01 UTC

Permalink

Agreed. That is why there is the qualifier `dynamic'. This indicates
that it is a completely different thing from static types.

If we agree about this, then there is no need to continue this
discussion. I'm not sure we do agree, though, because I doubt we'd be
right here in this conversation if we did.

I think we do agree.

The issue of `static vs. dynamic types' comes up about twice a year in
comp.lang.lisp It generally gets pretty heated, but eventually people
come to understand what the other person is saying (or they get bored
and drop out of the conversation - I'm not sure which). Someone always
points out that the phrase `dynamic types' really has no meaning in the
world of static type analysis. (Conversely, the notion of a `static
type' that is available at runtime has no meaning in the dynamic
world.) Much confusion usually follows.

You'll get much farther in your arguments by explaining what you mean
in detail rather than attempting to force a unification of teminology.
You'll also get farther by remembering that many of the people here
have not had much experience with real static type systems. The static
typing of C++ or Java is so primitive that it is barely an example of
static typing at all, yet these are the most common examples of
statically typed languages people typically encounter.

David Hopwood

2006-06-21 04:29:40 UTC

Permalink

[...] NOW that being said, I think
that the reason I like Haskell, a very strongly typed language, is that
because of it's type system, the language is able to do things like
lazy evaluation, [...]

Lazy evaluation does not depend on, nor is it particularly helped by
static typing (assuming that's what you mean by "strongly typed" here).

An example of a non-statically-typed language that supports lazy evaluation
is Oz. (Lazy functions are explicitly declared in Oz, as opposed to Haskell's
implicit lazy evaluation, but that's not because of the difference in type
systems.)

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

David Hopwood

2006-06-24 23:06:58 UTC

Permalink

2) I want to run my program, even though it is broken, and I
want to run right up to a broken part and trap there, so I can
use the runtime facilities of the language to inspect what's
going on.

I do this quite often. Sometimes I'll develop `in the debugger'. I'll
change some piece of code and run the program until it traps. Then,
without exiting the debugger, I'll fix the immediate problem and
restart the program at the point it trapped. This technique takes a
bit of practice, but if you are working on something that's complex and
has a lot of state, it saves a lot of time because you don't have to
reconstruct the state every time you make a change.

The problem with this is that from that point on, what you're running
is neither the old nor the new program, since its state may have been
influenced by the bug before you corrected it.

I find it far simpler to just restart the program after correcting
anything. If this is too difficult, I change the design to make it
less difficult.

Wow, interesting.
(I say the following only to point out differing strategies of
development, not to say one or the other is right or bad or
whatever.)
I occasionally find myself doing the above, and when I do,
I take it as a sign that I've screwed up. I find that process
excruciating, and I do everything I can to avoid it. Over the
years, I've gotten more and more adept at trying to turn
as many bugs as possible into type errors, so I don't have
to run the debugger.
Now, there might be an argument to be made that if I had
been using a dynamic language, the process wouldn't be
As a strawman: suppose there are two broad categories
of mental strategies for thinking about bugs, and people
fall naturally into one way or the other, the way there
are morning people and night people. One group is
drawn to the static analysis, one group hates it.
One group hates working in the debugger, one group
is able to use that tool very effectively and elegantly.
Anyway, it's a thought.

I don't buy this -- or at least, I am not in either group.

A good debugger is invaluable regardless of your attitude to type
systems. Recently I was debugging two programs written to do similar
things in the same statically typed language (C), but where a debugger
could not be used for one of the programs. It took maybe 5 times
longer to find and fix each bug without the debugger, and I found it
a much more frustrating experience.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Pascal Costanza

2006-06-27 20:57:34 UTC

Permalink

Clearly, in this example, the program
is invoking an operation (division) on values that are not appropriate
(zero for the second argument). Hence, if your definition really is a
definition, then this must qualify.

Here, you are asking more from dynamic type systems than any existing
type system provides. The definition of what is considered to be a type
error and what not is always part of the specification of a type system.

No, I'm not. Were we to follow this path of definition, it would not
require that dynamic type systems catch ALL type errors; only that they
catch some. Not that it matters, though. I am analyzing the logical
consequences of your own definition. If those logical consequences were
impossible to fulfill, that would be an even more convincing reason to
find a new definition. Concepts of fairness don't enter into the
equation.

Yes it does. All static type systems define only a strict subset of all
possible errors to be covered, and leave others as runtime errors. The
same holds for dynamic type systems.

If you want to make a statement instead of the sort you've implied
above... namely that a type error is *any* error that's raised by a type
system, then that's fine. It certainly brings us closer to static
types. Now, though, the task is to define a type system without making
a circular reference to types. You've already rejected the statement
that all runtime errors are type errors, because you specifically reject
the division by zero case as a type error for most languages. What is
that distinction?

I don't need such a distinction. I can just define what is covered by a
type system and what is not. All type systems work that way.

You've got to define something... or, I suppose, just go on using words
without definitions. You claimed to give a definition, though.
I have been led by others to believe that the right way to go in the
dynamic type sense is to first define type errors, and then just call
anything that finds type errors a type system. That would work, but it
would require a type error. You seem to want to push that work off of
the word "type error" and back onto "type system", but that requires
that you define what a type system is.

I didn't know I was supposed to give a definition.

Incidentally, in the case of static type systems, we define the system
(for example, using the definition given from Pierce many times this
thread), and then infer the definition of types and type errors from
there. Because a solid definition has been given first for a type
system without invoking the concepts of types or type errors, this
avoids being circular.

Do you mean this one? "A type system is a tractable syntactic method for
proving the absence of certain program behaviors by classifying phrases
according to the kinds of values they compute." This isn't really such a
strong definition because it shifts the problem of what exactly a type
system is to finding a definition for the word "kind".

But if this makes you happy, here is an attempt:

"A dynamic type system is a tractable syntactic method for proving the
absence of certain program behaviors by classifying values according to
their kinds."

Basically, this correlates with the typical approach of using tags to
indicate the type/kind/class of values. And indeed, this is what dynamic
type systems typically do: they check tags associated with values. Other
kinds of runtime errors are not raised because of such checks, but
because of other reasons. Therefore, there is naturally a distinction
between runtime errors that are type errors and those that are not.

I am not convinced that this definition of mine is a lot clearer than
what I have said before, but I am also not convinced that Pierce's
definition is any clearer either. It is merely a characterization of
what static type systems do.

The preciseness comes into play when actually defining a type system:
then you have to give definitions what the errors at runtime are that
you want to prove absent by way of the static type system, give the
typing rules for the type system, and then prove soundness by showing
that the typing rules correlate precisely to the runtime errors in the
first stage. Since you have to map two different views of the same thing
to each other you have to be precise in giving definitions that you can
then successfully use in your proofs.

In dynamic type system, this level of precision is not necessary because
proving that dynamic type errors is what the dynamic type system catches
is trivially true, and most programmers don't care that there is a
distinction between runtime type errors and runtime "other" errors. But
this doesn't mean that this distinction doesn't exist.

Pascal

--
3rd European Lisp Workshop
July 3 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

Raffael Cavallaro

2006-06-16 15:37:45 UTC

Permalink

On 2006-06-16 11:29:12 -0400, Raffael Cavallaro

In software like this it isn't worth satisfying a static type checker
because you don't get much of the benefit
anywaytext?Dx?description?text?Dx?fromname
as being able to run and test portions of a program before other parts
are written (forward references to as yet nonexistent functions).

I don't what bizarre key combination I accidentally hit here, but the
original read:

In software like this it isn't worth satisfying a static type checker
because you don't get much of the benefit anyway and it means forgoing
such advantages of dynamic typing as being able to run and test
portions of a program before other parts are written (forward
references to as yet nonexistent functions).

Anton van Straaten

2006-06-25 07:38:53 UTC

Permalink

But since the relevant feature that the languages in question possess is
dynamic tagging, it is more precise and accurate to use that term to
describe them.

So you're proposing to call them dynamically-tagged languages?

Also, dynamic tagging is only a minor help in this respect, as evidenced
by the fact that explicit tag tests are quite rarely used by most programs,
if I'm not mistaken.

It sounds as though you're not considering the language implementations
themselves, where tag tests occur all the time - potentially on every
operation. That's how "type errors" get detected. This is what I'm
referring to when I say that dynamic tags support latent types.

Tags are absolutely crucial for that purpose: without them, you have a
language similar to untyped lambda calculus, where "latent type errors"
can result in very difficult to debug errors, since execution can
continue past errors and produce completely uninterpretable results.

IMHO, the support does not go far enough for it to be
considered a defining characteristic of these languages.

Since tag checking is an implicit feature of language implementations
and the language semantics, it certainly qualifies as a defining
characteristic.

When tag tests are used implicitly by other language features such as
pattern matching and dynamic dispatch, they are used for purposes that are
equally applicable to statically typed and non-(statically-typed) languages.

A fully statically-typed language doesn't have to do tag checks to
detect static type errors.

Latently-typed languages do tag checks to detect latent type errors.

You can take the preceding two sentences as a summary definition for
"latently-typed language", which will come in handy below.

or that languages
that use dynamic tagging are "latently typed". This simply is not a
property of the language (as you've already conceded).

Right. I see at least two issues here: one is that as a matter of
shorthand, compressing "language which supports latent typing" to
"latently-typed language" ought to be fine, as long as the term's
meaning is understood.

If, for the sake of argument, "language which supports latent typing" is
to be compressed to "latently-typed language", then statically typed
languages must be considered also latently typed.

See definition above. The phrase "language which supports latent
typing" wasn't intended to be a complete definition.

After all, statically typed languages support expression and
verification of the "types in the programmer's head" at least as well
as non-(statically-typed) languages do. In particular, most recent
statically typed OO languages use dynamic tagging and are memory safe.
And they support comments ;-)

But they don't use tags checks to validate their static types.

When statically-typed languages *do* use tags, in cases where the static
type system isn't sufficient to avoid them, then indeed, those parts of
the program use latent types, in the exact same sense as more fully
latently-typed languages do. There's no conflict here, it's simply the
case that most statically-typed languages aren't fully statically typed.

This is not, quite obviously, what most people mean when they say
that a particular *language* is "latently typed". They almost always
mean that the language is dynamically tagged, *not* statically typed,
and memory safe. That is how this term is used in R5RS, for example.

The R5RS definition is compatible with what I've just described, because
the parts of a statically-typed language that would be considered
latently-typed are precisely those which rely on dynamic tags.

But beyond that, there's an issue here about the definition of "the
language". When programming in a latently-typed language, a lot of
action goes on outside the language - reasoning about static properties
of programs that are not captured by the semantics of the language.

This is true of programming in any language.

Right, but when you compare a statically-typed language to an untyped
language at the formal level, a great deal more static reasoning goes on
outside the language in the untyped case.

What I'm saying is that it makes no sense, in most realistic contexts,
to think of untyped languages as being just that: languages in which the
type of every term is simply a tagged value, as though no static
knowledge about that value exists. The formal model requires that you
do this, but programmers can't function if that's all the static
information they have. This isn't true in the case of a fully
statically-typed language.

This means that there's a sense in which the language that the
programmer programs in is not the same language that has a formal
semantic definition. As I mentioned in another post, programmers are
essentially mentally programming in a richer language - a language which
has informal (static) types - but the code they write down elides this
type information, or else puts it in comments.

If you consider stuff that might be in the programmer's head as part
of the program, where do you stop? When I maintain a program written
by someone I've never met, I have no idea what was in that programmer's
head. All I have is comments, which may be (and frequently are,
unfortunately) inaccurate.

You have to make the same kind of inferences that the original
programmer made. E.g. when you see a function that takes some values,
manipulates them using numeric operators, and returns a value, you have
to either figure out or trust the comments that the function accepts
numbers (or integers, floats etc.) and returns a number.

This is not some mystical psychological quality: it's something that
absolutely must be done in order to be able to reason about a program at
all.

(Actually, it's worse than that -- if I come back to a program 5 years
later, I probably have little idea what was in my head at the time I
wrote it.)

But that's simply the reality - you have to reconstruct: recover the
latent types, IOW. There's no way around that!

Just to be concrete, I'll resurrect my little Javascript example:

function timestwo(x) { return x*2 }

Assume I wrote this and distributed it in source form as a library, and
now you have to maintain it. How do you know what values to call it
with, or what kind of values it returns? What stops you from calling it
with a string?

Note that according the formal semantics of an untyped language, the
type of that function is "value -> value".

The phrase "in the programmer's head" was supposed to help communicate
the concept. Don't get hung up on it.

We have to accept, then, that the formal semantic definitions of
dynamically-checked languages are incomplete in some important ways.
Referring to those semantic definitions as "the language", as though
that's all there is to the language in a broader sense, is misleading.

Bah, humbug. The language is just the language.

If you want to have this discussion precisely, you have to do better
than that. There is an enormous difference between the formal semantics
of an untyped language, and the (same) language which programmers work
with and think of as "dynamically typed".

Use whatever terms you like to characterize this distinction, but you
can't deny that the distinction exists, and that it's quite a big,
important one.

In this context, the term "latently-typed language" refers to the
language that a programmer experiences, not to the subset of that
language which is all that we're typically able to formally define.

I'm with Marshall -- this is way too mystical for me.

I said more in my reply to Marshall. Perhaps I'm not communicating
well. At worst, what I'm talking about is informal. It's easy to prove
that you can't reason about a program if you don't know what types of
values a function accepts or returns - which is what an untyped formal
model gives you.

Anton

Andreas Rossberg

2006-06-23 13:11:49 UTC

Permalink

Well, it seems to me that you are /assuming/ a notion of what kinds of
logic can be called type (theories), and I don't share your
assumptions. No offence intended.

OK, but can you point me to any literature on type theory that makes a
different assumption?

'Fraid not. (I'm not a type theorist -- for all I know there may be lots, but
my suspicion is that they are rare at best.)

I would suspect the same :-). And the reason is that "type" has a
well-established use in theory. It is not just my "assumption", it is
established practice since 80 or so years. So far, this discussion has
not revealed the existence of any formal work that would provide a
theory of "dynamic types" in the sense it is used to characterise
"dynamically typed" languages.

So what you are suggesting may be an interesting notion, but it's not
what is called "type" in a technical sense. Overloading the same term
for something different is not a good idea if you want to avoid
confusion and misinterpretations.

But, as a sort of half-way, semi-formal, example: consider the type environment
in a Java runtime. The JVM does formal type-checking of classfiles as it loads
them. In most ways that checking is static -- it's treating the bytecode as
program text and doing a static analysis on it before allowing it to run (and
rejecting what it can't prove to be acceptable by its criteria). However, it
isn't /entirely/ static because the collection of classes varies at runtime in
a (potentially) highly dynamic way. So it can't really examine the "whole"
text of the program -- indeed there is no such thing. So it ends up with a
hybrid static/dynamic type system -- it records any assumptions it had to make
in order to find a proof of the acceptability of the new code, and if (sometime
in the future) another class is proposed which violates those assumptions, then
that second class is rejected.

Incidentally, I know this scenario very well, because that's what I'm
looking at in my thesis :-). All of this can easily be handled
coherently with well-established type machinery and terminology. No need
to bend existing concepts and language, no need to resort to "dynamic
typing" in the way Java does it either.

In code which will be executed at instant A
obj aMessage. "type correct"
obj anotherMessage. "type incorrect"
In code which will be executed at instant B
obj aMessage. "type incorrect"
obj anotherMessage. "type correct"
I don't see how a logic with no temporal element can arrive at all four those
judgements, whatever it means by a union type.

I didn't say that the type system cannot have temporal elements. I only
said that a type itself cannot change over time. A type system states
propositions about a program, a type assignment *is* a proposition. A
proposition is either true or false (or undecidable), but it is so
persistently, considered under the same context. So if you want a type
system to capture temporal elements, then these must be part of a type
itself. You can introduce types with implications like "in context A,
this is T, in context B this is U". But the whole quoted part then is
the type, and it is itself invariant over time.

- Andreas

Rob Thorpe

2006-06-20 14:57:41 UTC

Permalink

"A language is latently typed if a value has a property - called it's
type - attached to it, and given it's type it can only represent values
defined by a certain class."

"it [= a value] [...] can [...] represent values"?

???

I just quoted, in condensed form, what you said above: namely, that a
value represents values - which I find a strange and circular definition.

Yes, but the point is, as the other poster mentioned: values defined by
a class.
For example, in lisp:
"xyz" is a string, #(1 2 3) is an array, '(1 2 3) is a list, 45 is a
fixed-point number.
Each item that can be stored has a type, no item can be stored without
one. The types can be tested. Most dynamic typed languages are like
that.

Compare this to an untyped language where types cannot generally be
tested.

A (static) type system assigns types to (all) *expressions*.

That's right most of the time yes, I probably should have said
expressions. Though I can think of static typed languages where the
resulting type of an expression depends on the type of the variable it
is being assigned to.

Yes, but that's no contradiction. A type system does not necessarily
assign *unique* types to individual expressions (consider overloading,
subtyping, polymorphism, etc).

That's a fair way of looking at it.

Well I haven't programmed in any statically typed language where values
have types themselves.

They all have - the whole purpose of a type system is to ensure that any
expression of type T always evaluates to a value of type T.

But it only gaurantees this because the variables themselves have a
type, the values themselves do not. Most of the time the fact that the
variable they are held in has a type infers that the value does too.
But the value itself has no type, in a C program for example I can take
the value from some variable and recast it in any way I feel and the
language cannot correct any errors I make because their is no
information in the variable to indicate what type it is.

So when you
look at type systems formally then you certainly have to assign types to
values, otherwise you couldn't prove any useful property about those
systems (esp. soundness).

Yes, but indirectly.

Joachim Durchholz

2006-06-22 19:51:37 UTC

Permalink

immutable = can't change
vary-able = can change
Clearly a contradiction in terms.

Not in mathematics.
The understanding there is that a "variable" varies - not over time, but
according to the whim of the usage. (E.g. if a function is displayed in
a graph, the parameter varies along the X axis. If it's used within a
term, the parameter varies depending on how it's used. Etc.)

Similarly for computer programs.
Of course, if values are immutable, the value associated with a
parameter name cannot vary within the same invocation - but it can still
vary from one invocation to the next.

Regards,
Jo

Ketil Malde

2006-06-20 15:34:21 UTC

Permalink

Post by Rob Thorpe
But it only gaurantees this because the variables themselves have a
type, the values themselves do not.

I think statements like this are confusing, because there are
different interpretations of what a "value" is. I would say that the
integer '4' is a value, and that it has type Integer (for instance).
This value is different from 4 the Int16, or 4 the double-precision
floating point number. From this viewpoint, all values in statically
typed languages have types, but I think you use 'value' to denote the
representation of a datum in memory, which is a different thing.

-k

--
If I haven't seen further, it is by standing in the footprints of giants

Rob Thorpe

2006-06-20 17:20:33 UTC

Permalink

Post by Joe Marshall
The compiler
relys entirely on the types of the variables to know how to correctly
operate on the values. The values themselves have no type information
associated with them.

int x = (int) (20.5 / 3);
What machine code operations does the "/" there invoke? Integer
division, or floating point division? How did the variables involved in
the expression affect that?

In that case it knew because it could see at compile time. In general
though it doesn't.
If I divide x / y it only knows which to use because of types declared
for x and y.

Post by Joe Marshall

Casting in C takes values of one type to values of another type.

No it doesn't. Casting reinterprets a value of one type as a value of
another type.

No it doesn't.
int x = (int) 20.5;
There's no point at which bits from the floating point representation
appear in the variable x.
int * x = (int *) 0;
There's nothing that indicates all the bits of "x" are zero, and indeed
in some hardware configurations they aren't.

I suppose some are conversions and some reinterpretations. What I
should have said it that there are cases where cast reinterprets.

Darren New

2006-06-20 15:52:46 UTC

Permalink

Post by Rob Thorpe
Yes, but the point is, as the other poster mentioned: values defined by
a class.

A value can only represent one value, right? Or can a value have
multiple values?

Post by Rob Thorpe
"xyz" is a string,

"xyz" cannot represent values from the class of strings. It can only
represent one value.

I think that's what the others are getting at.

Post by Rob Thorpe

They all have - the whole purpose of a type system is to ensure that any
expression of type T always evaluates to a value of type T.

But it only gaurantees this because the variables themselves have a
type, the values themselves do not.

Sure they do. 23.5e3 is a "real" in Pascal, and there's no variable there.

("hello" % "there") is illegal in most languages, because the modulo
operator doesn't apply to strings. How could this be determined at
compile time if "hello" and "there" don't have types?

--
Darren New / San Diego, CA, USA (PST)
My Bath Fu is strong, as I have
studied under the Showerin' Monks.

Chris Smith

2006-06-26 18:53:42 UTC

Permalink

Of course zero is not appropriate as a second argument to the division
operator! I can't possibly see how you could claim that it is. The
only reasonable statement worth making is that there doesn't exist a
type system in widespread use that is capable of checking this.

...and this is typically not even checked at the stage where type tags
are checked in dynamically-typed languages. Hence, it is not a type
error. (A type error is always what you define to be a type error within
a given type system, right?)

Sure, it's not a type error for that language.

Note, this example was in response to David's reply that my definition
turns every runtime error into a type error. That's not the case, and
that's all I want to say.

But your definition does do that. Your definition of a type error was
when a program attempts to invoke an operation on values that are not
appropriate for this operation. Clearly, in this example, the program
is invoking an operation (division) on values that are not appropriate
(zero for the second argument). Hence, if your definition really is a
definition, then this must qualify.

However, it sounds as
if you're claiming that it wouldn't be possible for the type system to
do this?

No. I only need an example where a certain error is not a type error in
_some_ language. I don't need to make a universal claim here.

Definitions are understood to be statements of the form "if and only
if". They assert that /everything/ that fits the definition is
describable by the word, and /everything/ that doesn't is not
describable by the word. If that's not what you meant, then we are
still in search of a definition.

If you want to make a statement instead of the sort you've implied
above... namely that a type error is *any* error that's raised by a type
system, then that's fine. It certainly brings us closer to static
types. Now, though, the task is to define a type system without making
a circular reference to types. You've already rejected the statement
that all runtime errors are type errors, because you specifically reject
the division by zero case as a type error for most languages. What is
that distinction?

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Pascal Costanza

2006-06-28 16:14:20 UTC

Permalink

Whether you consider something you cannot do with statically typed
languages a bad idea or not is irrelevant. You were asking for things
that you cannot do with statically typed languages.

The whole point of static type systems is to make sure that there are
things that one cannot do. So the fact that there are such things are
not an argument per se against static types.

I am not arguing against static type systems. I am just responding to
the question what the things are that you cannot do in statically typed
languages.

[ ... ]

Beyond that, I am convinced that the ability to update a running
system without the need to shut it down can be an important asset.

And I am convinced that updating a running system in the style of,
e.g., Erlang, can be statically typed.

Maybe. The interesting question then is whether you can express the
kinds of dynamic updates that are relevant in practice. Because a static
type system always restricts what kinds of runtime behavior you can
express in your language. I am still skeptical, because changing the
types at runtime is basically changing the assumptions that the static
type checker has used to check the program's types in the first place.

For example, all the approaches that I have seen in statically typed
languages deal with adding to a running program (say, class fields and
methods, etc.), but not with changing to, or removing from it.

Note that prohibiting directly self-modifying code does not prevent a
program from specifying another program to *replace* it.

...and this creates problems with moving data from one version of a
program to the next.

How does this "create" such a problem? The problem is there in either
approach. In fact, I believe that the best chance we have of
addressing the problem is by adopting the "replace the code" model
along with a "translate the data where necessary at the time of
replacement". Translating the data, i.e., re-establishing the
invariants expected by the updated/replaced code, seems much harder
(to me) in the case of self-modifying code. Erlang got this one
right.

...and the "translate the date where necessary" approach is essentially
triggered by a dynamic type test (if value x is of an old version of
type T, update it to reflect the new version of type T [1]). QED.

Pascal

[1] BTW, that's also the approach taken in CLOS.

--
3rd European Lisp Workshop
July 3 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

Paul McGuire

2006-06-09 15:07:23 UTC

Permalink

<I've removed the massive cross-posting - I wouldn't presume this message is
all that interesting to folks in those other NG's, and I'm sure they'd be
saying, "who the heck is Paul McGuire, and who gives a @#*$! what he
thinks?">

"Joe Marshall" <eval.apply at gmail.com> wrote in message

Expressiveness isn't necessarily a good thing. For instance, in C, you
can express the
addresses of variables by using pointers. You cannot express the same
thing in Java, and
most people consider this to be a good idea.

For those who remember the bad old days of COBOL, its claim to fame was that
it was more like English prose, with the intent of making a programming
language that was as readable as English, assuming that this was more
"expressive", and not requiring as much of a mental mapping exercise for
someone trying to "read" a program. Even the language terminology itself
strived for this: statements were "sentences"; blocks were "paragraphs".
The sentence model may have ended up being one of COBOL's Achilles Heel's -
the placement of terminating periods for an IF THEN ELSE block was crucial
for disambiguating which ELSE went with which IF. Unfortunately, periods
are one of the least visible printed characters, and an extra or missing
period could cause hours of extra debugging.

(Of course, at the time of COBOL's inception, the only primary languages to
compare with were assembly or FORTRAN-60, so this idea wasn't totally
unfounded.)

-- Paul

Joachim Durchholz

2006-06-25 17:52:30 UTC

Permalink

Another observation: type safeness is more of a spectrum than a clearcut
distinction. Even ML and Pascal have ways to circumvent the type system,

No. I'm not sure about Pascal,

You'd have to use an untagged union type.
It's the standard maneuver in Pascal to get unchecked bitwise
reinterpretation.
Since it's an undefined behavior, you're essentially on your own, but in
practice, any compilers that implemented a different semantics were
hammered with bug reports until they complied with the standard - in
this case, an unwritten (and very unofficial) one, but a rather
effective one.

but (Standard) ML surely has none.

NLFFI?

Same with Haskell as defined by its spec.

Um... I'm not 100% sure, but I dimly (mis?)remember having read that
UnsafePerformIO also offered some ways to circumvent the type system.

(Not that this would be an important point anyway.)

OCaml has a couple of clearly
marked unsafe library functions, but no unsafe feature in the language
semantics itself.

If there's a library with not-typesafe semantics, then at least that
language implementation is not 100% typesafe.
If all implementations of a language are not 100% typesafe, then I
cannot consider the language itself 100% typesafe.

Still, even Pascal is quite a lot "more typesafe" than, say, C.

and even C is typesafe unless you use unsafe constructs.

Tautology. Every language is "safe unless you use unsafe constructs".

No tautology - the unsafe constructs aren't just typewise unsafe ;-p

That's exactly why I replaced Luca Cardelli's "safe/unsafe"
"typesafe/not typesafe". There was no definition to the original terms
attached, and this discussion is about typing anyway.

(Unfortunately, you can hardly write interesting programs in any safe
subset of C.)

Now that's a bold claim that I'd like to see substantiated.

IOW from a type-theoretic point of view, there is no real difference
between their typesafe and not typesafe languages in the "statically
typed" column; the difference is in the amount of unsafe construct usage
in practial programs.

Huh? There is a huge, fundamental difference: namely whether a type
system is sound or not.

I think you're overstating the case.

In type theory, of course, there's no such things as an "almost typesafe
language" - it's typesafe or it isn't.

In practice, I find no implementation where type mismatches cannot
occur, if only when interfacing with the external world (except if you
cheat by treating everything external as a byte sequence, which is like
saying that all languages have at least a universal ANY type and are
hence statically-typed).
And in those languages that do have type holes, these holes may be more
or less relevant - and it's a *very* broad spectrum here.
And from that perspective, if ML indeed has no type hole at all, then
it's certainly an interesting extremal point, but I still have a
relevant spectrum down to assembly language.

Regards,
Jo

rossberg

2006-06-25 19:42:44 UTC

Permalink

Post by Joachim Durchholz

but (Standard) ML surely has none.

NLFFI?

Same with Haskell as defined by its spec.

Um... I'm not 100% sure, but I dimly (mis?)remember having read that
UnsafePerformIO also offered some ways to circumvent the type system.

Neither NLFFI nor unsafePerformIO are official part of the respective
languages. Besides, foreign function interfaces are outside a language
by definition, and can hardly be taken as an argument - don't blame
language A that unsafety arises when you subvert it by interfacing with
unsafe language B on a lower level.

Post by Joachim Durchholz

and even C is typesafe unless you use unsafe constructs.

Tautology. Every language is "safe unless you use unsafe constructs".

No tautology - the unsafe constructs aren't just typewise unsafe ;-p
That's exactly why I replaced Luca Cardelli's "safe/unsafe"
"typesafe/not typesafe". There was no definition to the original terms
attached, and this discussion is about typing anyway.

The Cardelli paper I was referring to discusses it in detail. And if
you look up the context of my posting: it was exactly whether safety is
to be equated with type safety.

Post by Joachim Durchholz

Huh? There is a huge, fundamental difference: namely whether a type
system is sound or not.

I think you're overstating the case.
In type theory, of course, there's no such things as an "almost typesafe
language" - it's typesafe or it isn't.

Right, and you were claiming to argue from a type-theoretic POV.

[snipped the rest]

- Andreas

David Hopwood

2006-06-26 22:40:10 UTC

Permalink

While this effort to salvage the term "type error" in dynamic
languages is interesting, I fear it will fail. Either we'll all have
to admit that "type" in the dynamic sense is a psychological concept
with no precise technical definition (as was at least hinted by
Anton's post earlier, whether intentionally or not) or someone is
going to have to propose a technical meaning that makes sense,
independently of what is meant by "type" in a static system.

What about this: You get a type error when the program attempts to
invoke an operation on values that are not appropriate for this
operation.
Examples: adding numbers to strings; determining the string-length of a
number; applying a function on the wrong number of parameters; applying
a non-function; accessing an array with out-of-bound indexes; etc.

This makes essentially all run-time errors (including assertion failures,
etc.) "type errors". It is neither consistent with, nor any improvement
on, the existing vaguely defined usage.

That is an implementation detail. I don't see its relevance to the above
argument at all.

This is maybe better understandable in user-level code. Consider the
class Person {
String name;
int age;
void buyPorn() {
if (< this.age 18) throw new AgeRestrictionException();
...
}
}
The message p.buyPorn() is a perfectly valid message send and will pass
both static and dynamic type tests (given p is of type Person in the
static case). However, you will still get a runtime error.

Your definition above was "You get a type error when the program attempts
to invoke an operation on values that are not appropriate for this operation."

this.age, when less than 18, is a value that is inappropriate for the buyPorn()
operation, and so this satisfies your definition of a type error. My point was
that it's difficult to see any kind of program error that would *not* satisfy
this definition.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

David Hopwood

2006-06-24 22:00:54 UTC

Permalink

I think that we're finally getting to the bottom of things. While
reading your reponses something became very clear to me: latent-typing and
latent-types are not a property of languages. Latent-typing, also known as
informal reasoning, is something that all programmers do as a normal part
of programming. To say that a language is latently-typed is to make a
category mistake, because latent-typing is not a property of languages.

I disagree with you and agree with Anton. Here, it is helpful to
understand the history of Scheme a bit: parts of its design are a
reaction to what Schemers perceived as having failed in Common Lisp (and
other previous Lisp dialects).
One particularly illuminating example is the treatment of nil in Common
Lisp. That value is a very strange beast in Common Lisp because it
stands for several concepts at the same time: most importantly the empty
list and the boolean false value. Its type is also "interesting": it is
both a list and a symbol at the same time. It is also "interesting" that
its quoted value is equivalent to the value nil itself. This means that
(if nil 42 4711)
(if 'nil 42 4711)
Both forms evaluate to 4711.
It's also the case that taking the car or cdr (first or rest) of nil
doesn't give you an error, but simply returns nil as well.
The advantage of this design is that it allows you to express a lot of
code in a very compact way. See
http://www.apl.jhu.edu/~hall/lisp/Scheme-Ballad.text for a nice
illustration.
The disadvantage is that it is mostly impossible to have a typed view of
nil, at least one that clearly disambiguates all the cases. There are
also other examples where Common Lisp conflates different types, and
sometimes only for special cases. [1]
http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-6.html#%25_sec_3.2
This clearly deviates strongly from Common Lisp (and other Lisp
dialects). The emphasis here is on a clear separation of all the types
specified in the Scheme standard, without any exception. This is exactly
what makes it straightforward in Scheme to have a latently typed view of
programs, in the sense that Anton describes. So latent typing is a
property that can at least be enabled / supported by a programming
language, so it is reasonable to talk about this as a property of some
dynamically typed languages.

If anything, I think that this example supports my and Vesa's point.
The example demonstrates that languages
*that are not distinguished in whether they are called latently typed*
support informal reasoning about types to varying degrees.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Marshall

2006-06-23 17:15:56 UTC

Permalink

I have to object to the term "hybrid".

Java has a static type system.
Java has runtime tags and tag checks.

The two are distinct, and neither one is less than complete, so
I don't think "hybrid" captures the situation well.

Marshall

Marshall

2006-06-23 22:12:48 UTC

Permalink

[me:]

Post by Marshall

But, as a sort of half-way, semi-formal, example: consider the type
environment in a Java runtime. The JVM does formal type-checking of
classfiles as it loads them. In most ways that checking is static --
it's treating the bytecode as program text and doing a static analysis
on it before allowing it to run (and rejecting what it can't prove to
be acceptable by its criteria). However, it isn't /entirely/ static
because the collection of classes varies at runtime in a (potentially)
highly dynamic way. So it can't really examine the "whole" text of the
program -- indeed there is no such thing. So it ends up with a hybrid
static/dynamic type system -- it records any assumptions it had to make
in order to find a proof of the acceptability of the new code, and if
(sometime in the future) another class is proposed which violates those
assumptions, then that second class is rejected.

I have to object to the term "hybrid".
Java has a static type system.
Java has runtime tags and tag checks.

It has both, agreed, but that isn't the full story. I think I explained what I
meant, and why I feel the term is justified as well as I can in the second half
of the paragraph you quoted. I doubt if I can do better.
Maybe you are thinking that I mean that /because/ the JVM does verification,
etc, at "runtime" the system is hybrid ?
Anyway that is /not/ what I mean. I'm (for these purposes) completely
uninterested in the static checking done by the Java to bytecode translator,
javac. I'm interested in what happens to the high-level, statically typed, OO,
language called "java bytecode" when the JVM sees it. That language has a
strict static type system which the JVM is required to check. That's a
/static/ check in my book -- it happens before the purportedly correct code is
accepted, rather than while that code is running.
I am also completely uninterested (for these immediate purposes) in the run
time checking that the JVM does (the stuff that results in
NoSuchMethodException, and the like). In the wider context of the thread, I do
want to call that kind of thing (dynamic) type checking -- but those checks are
not why I call the JVMs type system hybrid either.
Oh well, having got that far, I may as well take another stab at "hybrid".
Since the JVM is running a static type system without access to the whole text
of the program, there are some things that it is expected to check which it
can't. So it records preconditions on classes which might subsequently be
loaded. Those are added to the static checks on future classes, but -- as far
as the original class is concerned -- those checks happen dynamically. So part
of the static type checking which is supposed to happen, has been postponed to
a dynamic check. It's that, and /only/ that which I meant by "hybrid".

That's fair, I guess. I wouldn't use the terms you do, but maybe that
doesn't matter very much assuming we understand each other
at this point.

Marshal

Pascal Costanza

2006-06-16 09:59:25 UTC

Permalink

On a similar note, is a statically typed langauge more or less
expressive than a dynamically typed language? Some would say less, as
you can write programs in a dynamically typed language that you can't
compile in a statically typed language (without a lot of encoding),
whereas the converse isn't true.

It's important to get the levels right here: A programming language
with a rich static type system is more expressive at the type level,
but less expressive at the base level (for some useful notion of
expressiveness ;).

However, I think this is misleading,
as it ignores the feedback issue: It takes longer for the average
programmer to get the program working in the dynamically typed
language.

This doesn't seem to capture what I hear from Haskell programmers who
say that it typically takes quite a while to convince the Haskell
compiler to accept their programs. (They perceive this to be
worthwhile because of some benefits wrt correctness they claim to get
in return.)

That's the point: Bugs that in dynamically typed languages would
require testing to find are found by the compiler in a statically
typed language.

Yes. However, unfortunately statically typed languages also reject
programs that don't have such bugs. It's a tradeoff whether you want to
spend time to deal with them or not.

So whil eit may take onger to get a program thatgets
past the compiler, it takes less time to get a program that works.

That's incorrect. See http://haskell.org/papers/NSWC/jfp.ps - especially
Figure 3.

Pascal

--
3rd European Lisp Workshop
July 3 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

Matthias Blume

2006-06-23 13:18:09 UTC

Permalink

...

An example of a form of informal reasoning that (practically) every
programmer does daily is termination analysis. There are type systems
that guarantee termination, but I think that is fair to say that it
is not
yet understood how to make a practical general purpose language, whose
type system would guarantee termination (or at least I'm not aware
of such
a language). It should also be clear that termination analysis need not
be done informally. Given a program, it may be possible to
formally prove
that it terminates.

To make the halting problem decidable one would have to do one of
two
things: Depend on memory size limits, or have a language that really is
less expressive, at a very deep level, than any of the languages
mentioned in the newsgroups header for this message.

Not quite. See http://en.wikipedia.org/wiki/ACL2

What do you mean "not quite"? Of course, Patricia is absolutely
right. Termination-guaranteeing languages are fundamentally less
expressive than Turing-complete languages. ACL2 was not mentioned in
the newsgroup header.

David Hopwood

2006-06-27 21:57:13 UTC

Permalink

The real question is, are there some programs that we
can't write *at all* in a statically typed language, because
they'll *never* be typable?

In a statically typed language that has a "dynamic" type, all
dynamically typed programs are straightforwardly expressible.

What about programs where the types change at runtime?

Staged compilation is perfectly compatible with static typing.
Static typing only requires that it be possible to prove absence
of some category of type errors when the types are known; it
does not require that all types are known before the first-stage
program is run.

There are, however, staged compilation systems that guarantee that
the generated program will be typeable if the first-stage program
is.

(It's clear that to compare the expressiveness of statically and
dynamically typed languages, the languages must be comparable in
other respects than their type system. Staged compilation is the
equivalent feature to 'eval'.)

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Chris Smith

2006-06-21 18:34:37 UTC

Permalink

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Marshall

2006-06-23 01:40:56 UTC

Permalink

That's the important point: I want to run broken code.

I want to make sure I understand. I can think of several things
you might mean by this. It could be:
1) I want to run my program, even though I know parts of it
are broken, because I think there are parts that are not broken
and I want to try them out.
2) I want to run my program, even though it is broken, and I
want to run right up to a broken part and trap there, so I can
use the runtime facilities of the language to inspect what's
going on.

I want to run
as much of the working fragments as I can, and I want a `safety net' to
prevent me from performing undefined operations, but I want the safety
net to catch me at the *last* possible moment.

This statement is interesting, because the conventional wisdom (at
least as I'm used to hearing it) is that it is best to catch bugs
at the *first* possible moment. But I think maybe we're talking
about different continua here. The last last last possible moment
is after the software has shipped to the customer, and I'm pretty
sure that's not what you mean. I think maybe you mean something
more like 2) above.

Marshall

Pascal Costanza

2006-06-23 12:01:51 UTC

Permalink

Post by Marshall

That's the important point: I want to run broken code.

I want to make sure I understand. I can think of several things
1) I want to run my program, even though I know parts of it
are broken, because I think there are parts that are not broken
and I want to try them out.
2) I want to run my program, even though it is broken, and I
want to run right up to a broken part and trap there, so I can
use the runtime facilities of the language to inspect what's
going on.

Nowadays, we have more options wrt what it means to "ship" code. It
could be that your program simply runs as a (web) service to which you
have access even after the customer has started to use the program. See
http://www.paulgraham.com/road.html for a good essay on this idea.

Pascal

--
3rd European Lisp Workshop
July 3 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

Chris F Clark

2006-06-27 04:37:05 UTC

Permalink

The important thing is the dynamicism of lisp allowed one to write
polymorphic programs, before most of us knew the term.
Sure. In exchange for giving up the proofs of the type checker, you
could write all kinds of programs. To this day, we continue to write
programs in languages with dynamic checking features because we don't
have powerful enough type systems to express the corresponding type
system.

And to me the question is what kinds of types apply to these dynamic
programs, where in fact you may have to solve the halting problem to
know exactly when some statement is executed. I expect that some
programs have type signatures that exceed the expressibility of any
static system (that is Turing complete). Therefore, we need something
that "computes" the appropriate type at run-time, because we need full
Turing power to compute it. To me such a system is a "dynamic type
system". I think dynamic tags are a good approximation, because they
only compute what type the expression has this time.

I believe that, in fact, it would be trivial to imagine a type system
which is capable of expressing that type. Okay, not trivial, since it
appears to be necessary to conceive of an infinite family of integer
types with only one value each, along with range types, and
relationships between them; but it's probably not completely beyond the
ability of a very bright 12-year-old who has someone to describe the
problem thoroughly and help her with the notation.

Well, it look like you are right in that I see following is a Haskell
program that looks essentially correct. I wanted something that was
simple enough that one could see that it could be computed, but which
was complex enough that it had to be computed (and couldn't be
statically expressed with a system that did no "type computations").
Perhaps, there is no such beast. Or, perhaps I just can't formulate
it. Or, perhaps we have static type checkers which can do
computations of unbounded complexity. However, I thought that one of
the characteristics of type systems was that they did not allow
unbounded complexity and weren't Turing Complete.

You would, of course, need a suitable type system first. For example,
it appears to me that there is simply no possible way of expressing what
you've described in Java, even with the new generics features. Perhaps
it's possible in ML or Haskell (I don't know). My point is that if you
were allowed to design a type system to meet your needs, I bet you could
do it.

Or, I could do as I think the dynamic programmers do, dispense with
trying to formulate a sufficiently general type system and just check
the tags at the appropriate points.

Sure. The important question, then, is whether there exists any program
bug that can't be formulated as a type error.

If you allow Turing Complete type systems, then I would say no--every
bug can be reforumlated as a type error. If you require your type
system to be less powerful, then some bugs must escape it.

-Chris

Marshall

2006-06-27 15:08:03 UTC

Permalink

The real question is, are there some programs that we
can't write *at all* in a statically typed language, because
they'll *never* be typable?

In a statically typed language that has a "dynamic" type, all
dynamically typed programs are straightforwardly expressible.

So, how does this "dynamic" type work? It can't simply be
the "any" type, because that type has no/few functions defined
on it. It strikes me that *when* exactly binding happens will
matter. In a statically typed language, it may be that all
binding occurs at compile time, and in a dynamic language,
it may be that all binding occurs at runtime. So you might
have to change the binding mechanism as well. Does the
"dynamic" type allow this?

Marshall

David Hopwood

2006-06-27 17:17:23 UTC

Permalink

Post by Marshall

The real question is, are there some programs that we
can't write *at all* in a statically typed language, because
they'll *never* be typable?

In a statically typed language that has a "dynamic" type, all
dynamically typed programs are straightforwardly expressible.

So, how does this "dynamic" type work?

<http://citeseer.ist.psu.edu/abadi89dynamic.html>

Post by Marshall
It can't simply be the "any" type, because that type has no/few
functions defined on it.

It isn't. From the abstract of the above paper:

[...] even in statically typed languages, there is often the need to
deal with data whose type cannot be determined at compile time. To handle
such situations safely, we propose to add a type Dynamic whose values are
pairs of a value v and a type tag T where v has the type denoted by T.
Instances of Dynamic are built with an explicit tagging construct and
inspected with a type safe typecase construct.

"Gradual typing" as described in
<http://www.cs.rice.edu/~jgs3847/pubs/pubs/2006/siek06:_gradual.pdf> is
another alternative. The difference between gradual typing and a
"dynamic" type is one of convenience rather than expressiveness --
gradual typing does not require explicit tagging and typecase constructs.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Diez B. Roggisch

2006-06-27 16:37:22 UTC

Permalink

The C++ type system is Turing complete, although in practical terms
it limits how much processing power it will spend on types at
compile time.

I think templates only have to expand to seven levels, so you are
severely limited here.

You can adjust the iteration-depth. However, as turing-complete implies the
potential to create infinite loops - it enforces _some_ boundary. Otherwise
the user hitting C-c will do :)

Diez

rossberg

2006-06-29 21:10:08 UTC

Permalink

Which is why this actually is a very bad example to chose for dynamic
typing advocacy... ;-)

Actually, this seems a *good* example. The problem seems to be that
you end up throwing the baby out with the bathwater: your static type
system stops catching the errors you want once you make it powerful
enough to express certain programs.

That is not the case: you can still express these programs, you just
need to do an indirection through a datatype.

- Andreas

Joe Marshall

2006-06-28 18:56:37 UTC

Permalink

The point is that there exists (by construction) programs that can
never be statically checked.

I don't think you've shown that. I would like to see a more explicit
construction of a dynamically typed [*] program with a given specification,
where it is not possible to write a statically typed program that meets
the same specification using the same algorithms.

I think we're at cross-purposes here. It is trivial to create a static
type system that has a `universal type' and defers all the required
type narrowing to runtime. In effect, we've `turned off' the static
typing (because everything trivially type checks at compile time).

However, a sufficiently powerful type system must be incomplete or
inconsistent (by Godel). A type system proves theorems about a
program. If the type system is rich enough, then there are unprovable,
yet true theorems. That is, programs that are type-safe but do not
type check. A type system that could reflect on itself is easily
powerful enough to qualify.

AFAIK, it is *always* possible to construct such a statically typed
program in a type system that supports typerec or gradual typing. The
informal construction you give above doesn't contradict that, because
it depends on reflecting on a [static] type system, and in a dynamically
typed program there is no type system to reflect on.

A static type system is usually used for universal proofs: For all
runtime values of such and such... Dynamic checks usually provide
existential refutations: This particular value isn't a string. It may
be the case that there is no universal proof, yet no existential
refutation.

Note that I'm not claiming that you can check any desirable property of
a program (that would contradict Rice's Theorem), only that you can
express any dynamically typed program in a statically typed language --
with static checks where possible and dynamic checks where necessary.

Sure. And I'm not saying that you cannot express these programs in a
static language, but rather that there exist programs that have
desirable properties (that run without error and produce the right
answer) but that the desirable properties cannot be statically checked.

The real question is how common these programs are, and what sort of
desirable properties are we trying to check.

Joe Marshall

2006-06-27 21:26:31 UTC

Permalink

(defun noisy-apply (f arglist)
(format t "I am now about to apply ~s to ~s" f arglist)
(apply f arglist))
(defun blackhole (argument)
(declare (ignore argument))
#'blackhole)
But wait a sec. It seems that these were examples I invented in
response to the same question from you!

Ah, how well I remember that thread, and how little I got from it.
My memories of that thread are
1) the troll who claimed that Java was unable to read two numbers from
standard input and add them together.
2) The fact that all the smart people agreed the blackhole
function indicated something profound.
3) The fact that I didn't understand the black hole function.
The noisy-apply function I think I understand; it's generic on the
entire arglist. In fact, if I read it correctly, it's even generic
on the *arity* of the function, which is actually pretty impressive.
True?

Yes.

This is an issue I've been wrestling with in my own type
system investigations: how to address genericity across arity.
Does noisy-apply get invoked the same way as other functions?
That would be cool.

Yes, but in testing I found an incompatability. Here's a better
definiton:

(defun noisy-apply (f &rest args)
(format t "~&Applying ~s to ~s." f (apply #'list* args))
(apply f (apply #'list* args)))

CL-USER> (apply #'+ '(2 3))
5

CL-USER> (noisy-apply #'+ '(2 3))
Applying #<function + 20113532> to (2 3).
5

The internal application of list* flattens the argument list. The
Common Lisp APPLY function has variable arity and this change makes my
NOISY-APPLY be identical.

As to the black hole function, could you explain it a bit?

It is a function that takes any argument and returns the function
itself.

I apologize
for my lisp-ignorance. I am sure there is a world of significance
in the # ' on the third line, but I have no idea what it is.

The #' is a namespace operator. Since functions and variables have
different namespaces, I'm indicating that I wish to return the function
named blackhole rather than any variable of the same name that might be
around.

Chris Smith

2006-06-22 21:08:39 UTC

Permalink

Looking this thread growing it appears to me, that at least one thing
Xah unwillingness to learn from the bad past experience contaminates
others (who are still posting to his trolling threads).
Here another try to rescue these ones who are just virgin enough not to

I am enjoying the discussion. I think several other people are, too.
(At least, I hope so!) It matters not one whit to me who started the
thread, or the merits of the originating post. Why does it matter to
you?

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Chris Uppal

2006-06-23 12:11:53 UTC

Permalink

In that case, you could say that the conceptual type is different than
the inferred static type. But most of the time, the human is reasoning
about pretty much the same types as the static types that Haskell
infers. Things would get a bit confusing otherwise.

Or any mechanised or formalised type system, for any language. If a system
doesn't match pretty closely with at least part of the latent type systems (in
your sense) used by the programmers, then that type system is useless.

(I gather that it took, or maybe is still taking, theorists a while to get to
grips with the infromal type logics which were obvious to working OO
programmers.)

-- chris

Darren New

2006-06-21 17:07:20 UTC

Permalink

There are *tons* of languages that "actually" facilitate abstract data
types, and some of these languages are actually used by real people.

I don't know of any others in actual use. Could you name a couple?

Note that I don't consider things like usual OO languages (Eiffel,
Smalltalk, etc) to have "abstract data types".

--
Darren New / San Diego, CA, USA (PST)
My Bath Fu is strong, as I have
studied under the Showerin' Monks.

John W. Kennedy

2006-06-22 15:37:09 UTC

Permalink

Another language which has *neither* latent ("dynamic") nor
manifest ("static") types is (was?) BLISS[1], in which, like
assembler, variables are "just" addresses[2], and values are
"just" a machine word of bits.

360-family assembler, yes. 8086-family assembler, not so much.

--
John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
-- Charles Williams. "Taliessin through Logres: Prelude"

Matthias Blume

2006-06-21 13:45:22 UTC

Permalink

Perhaps better: A language is statically typed if its definition
includes (or ever better: is based on) a static type system, i.e., a
static semantics with typing judgments derivable by typing rules.
Usually typing judgmets associate program phrases ("expressions") with
types given a typing environment.

This is defining a single term ("statically typed") using three
undefined terms ("typing judgements", "typing rules", "typing
environment").

This was not meant to be a rigorous definition.

Rigorous or not, introducing additional undefined terms doesn't help
with explaining a term.

I think you missed my point. My point was that a language is
statically typed IF IT IS DEFINED THAT WAY, i.e., if it has a static
type system that is PART OF THE LANGUAGE DEFINITION. The details are
up to each individual definition.

Also, I'm not going to repeat the textbook definitions for those
three standard terms here.

These terms certainly aren't standard for Perl, Python, Java, or Lisp,

Indeed. That's because these languages are not statically typed.

Neelakantan Krishnaswami

2006-06-14 22:57:58 UTC

Permalink

This doesn't seem obviously the case to me. If you have static
information about your program, the compiler can use this information
to automate a lot of grunt work away.

Haskell's system of typeclasses work this way. If you tell the
compiler how to print integers, and how to print lists, then when you
call a print function on a list of list of integers, then the compiler
will automatically figure out the right print function using your base
definitions. This yields an increase in Felleisen-expressiveness over
a dynamically typed language, because you would need to globally
restructure your program to achieve a similar effect.

More dramatic are the "polytypic" programming languages, which let you
automate even more by letting you write generic map, fold, and print
functions which work at every type.

This doesn't seem to capture what I hear from Haskell programmers
who say that it typically takes quite a while to convince the
Haskell compiler to accept their programs. (They perceive this to be
worthwhile because of some benefits wrt correctness they claim to
get in return.)

This is true, if you are a novice learning the language, or if you are
an expert programming with good style.

If you encode your invariants in the types, then type errors will
signal broken invariants. But: learning how to use the type system to
encode significantly complex invariants (eg, that an abstract syntax
tree representing an HTML document actually satisfies all of the
constraints on valid HTML) takes experience to do well.

--
Neel Krishnaswami
neelk at cs.cmu.edu

Chris Smith

2006-06-22 18:34:17 UTC

Permalink

I think we're agreed (you and I anyway, if not everyone in this thread) that we
don't want to talk of "the" type system for a given language. We want to allow
a variety of verification logics. So a static type system is a logic which can
be implemented based purely on the program text without making assumptions
about runtime events (or making maximally pessimistic assumptions -- which comes
to the same thing really). I suggest that a "dynamic type system" is a
verification logic which (in principle) has available as input not only the
program text, but also the entire history of the program execution up to the
moment when the to-be-checked operation is invoked.

I am trying to understand how the above statement about dynamic types
actually says anything at all. So a dynamic type system is a system of
logic by which, given a program and a path of program execution up to
this point, verifies something. We still haven't defined "something",
though. We also haven't defined what happens if that verification
fails. One or the other or (more likely) some combination of the two
must be critical to the definition in order to exclude silly
applications of it. Presumably you want to exclude from your definition
of a dynamic "type system" which verifies that a value is non-negative,
and if so executes the block of code following "then"; and otherwise,
executes the block of code following "else". Yet I imagine you don't
want to exclude ALL systems that allow the programmer to execute
different code when the verification fails (think exception handlers)
versus succeeds, nor exclude ALL systems where the condition is that a
value is non-negative.

In other words, I think that everything so far is essentially just
defining a dynamic type system as equivalent to a formal semantics for a
programming language, in different words that connote some bias toward
certain ways of looking at possibilities that are likely to lead to
incorrect program behavior. I doubt that will be an attractive
definition to very many people.

Note that not all errors that I would want to call type errors are necessarily
caught by the runtime -- it might go happily ahead never realising that it had
just allowed one of the constraints of one of the logics I use to reason about
the program. What's known as an undetected bug -- but just because the runtime
doesn't see it, doesn't mean that I wouldn't say I'd made a type error. (The
same applies to any specific static type system too, of course.)

In static type system terminology, this quite emphatically does NOT
apply. There may, of course, be undetected bugs, but they are not type
errors. If they were type errors, then they would have been detected,
unless the compiler is broken.

If you are trying to identify a set of dynamic type errors, in a way
that also applies to statically typed languages, then I will read on.

But the checks the runtime does perform (whatever they are, and whenever they
happen), do between them constitute /a/ logic of correctness. In many highly
dynamic languages that logic is very close to being maximally optimistic, but
it doesn't have to be (e.g. the runtime type checking in the JMV is pretty
pessimistic in many cases).
Anyway, that's more or less what I mean when I talk of dynamically typed
language and their dynamic type systems.

So my objections, then, are in the first paragraph.

[**] Although there are operations which are not possible, reading another
object's instvars directly for instance, which I suppose could be taken to
induce a non-trivial (and static) type logic.

In general, I wouldn't consider a syntactically incorrect program to
have a static type error. Type systems are, in fact, essentially a tool
so separate concerns; specifically, to remove type-correctness concerns
from the grammars of programming languages. By doing so, we are able at
least to considerably simplify the grammar of the language, and perhaps
also to increase the "tightness" of the verification without risking
making the language grammar context-sensitive. (I'm unsure about the
second part of that statement, but I can think of no obvious theoretical
reason to assume that combining a type system with a regular context-
free grammar would yield another context-free grammar. Then again,
formal languages are not my strong point.)

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Chris Uppal

2006-06-23 10:55:04 UTC

Permalink

It's worth noting, too, that (in some sense) the type of an object
can change over time[*].

No. Since a type expresses invariants, this is precisely what may
*not* happen. If certain properties of an object may change then the
type of
the object has to reflect that possibility. Otherwise you cannot
legitimately call it a type.

Well, it seems to me that you are /assuming/ a notion of what kinds of
logic can be called type (theories), and I don't share your
assumptions. No offence intended.

OK, but can you point me to any literature on type theory that makes a
different assumption?

'Fraid not. (I'm not a type theorist -- for all I know there may be lots, but
my suspicion is that they are rare at best.)

But perhaps I shouldn't have used the word theory at all. What I mean is that
there is one or more logic of type (informal or not -- probably informal) with
respect to which the object in question has changed it categorisation. If no
existing type /theory/ (as devised by type theorists) can handle that case,
then that is a deficiency in the set of existing theories -- we need newer and
better ones.

But, as a sort of half-way, semi-formal, example: consider the type environment
in a Java runtime. The JVM does formal type-checking of classfiles as it loads
them. In most ways that checking is static -- it's treating the bytecode as
program text and doing a static analysis on it before allowing it to run (and
rejecting what it can't prove to be acceptable by its criteria). However, it
isn't /entirely/ static because the collection of classes varies at runtime in
a (potentially) highly dynamic way. So it can't really examine the "whole"
text of the program -- indeed there is no such thing. So it ends up with a
hybrid static/dynamic type system -- it records any assumptions it had to make
in order to find a proof of the acceptability of the new code, and if (sometime
in the future) another class is proposed which violates those assumptions, then
that second class is rejected.

I see no reason,
even in practise, why a static analysis should not be able to see that
the set of acceptable operations (for some definition of acceptable)
for some object/value/variable can be different at different times in
the execution.

Neither do I. But what is wrong with a mutable reference-to-union type,
as I suggested? It expresses this perfectly well.

Maybe I misunderstood what you meant by union type. I took it to mean that the
type analysis didn't "know" which of the two types was applicable, and so would
reject both (or maybe accept both ?). E..g if at instant A some object, obj,
was in a state where it to responds to #aMessage, but not #anotherMessage; and
at instant B it is in a state where it responds to #anotherMessage but not
#aMessage. In my (internal and informal) type logic, make the following
judgements:

In code which will be executed at instant A
obj aMessage. "type correct"
obj anotherMessage. "type incorrect"

In code which will be executed at instant B
obj aMessage. "type incorrect"
obj anotherMessage. "type correct"

I don't see how a logic with no temporal element can arrive at all four those
judgements, whatever it means by a union type.

-- chris

Raffael Cavallaro

2006-06-14 14:51:05 UTC

Permalink

On 2006-06-14 09:42:25 -0400, torbenm at app-1.diku.dk (Torben ?gidius

It takes longer for the average
programmer to get the program working in the dynamically typed
language.

Though I agree with much of your post I would say that many here find
the opposite to be true - it takes us longer to get a program working
in a statically typed language because we have to keep adding/changing
things to get the compiler to stop complaining and actually compile and
run a program which would be perfectly permissible in a dynamically
typed language such as common lisp - for example - heterogeneous lists
and forward references to as yet non-existent functions.

Joachim Durchholz

2006-06-16 22:09:26 UTC

Permalink

Give a heterogenous list that would to too awkward to live in a
statically-typed language.

Write a function that takes an arbitrary set of arguments and stores
them into a structure allocated on the heap.

If the set of arguments is really arbitrary, then the software can't do
anything with it. In that case, the type is simply "opaque data block",
and storing it in the heap requires nothing more specific than that of
"opaque data block".
There's more in this. If we see a function with a parameter type of
"opaque data block", and there's no function available except copying
that data and comparing it for equality, then from simply looking at the
function's signature, we'll know that it won't inspect the data. More
interestingly, we'll know that funny stuff in the data might trigger
bugs in the code - in the context of a security audit, that's actually a
pretty strong guarantee, since the analysis can stop at the function't
interface and doesn't have to dig into the function's implementation.

Give a case of calling nonexistent functions that's useful.

See the Tcl "unknown" proc, used for interactive command expansion,
dynamic loading of code on demand, etc.

Not related to dynamic typing, I fear - I can easily envision
alternatives to that in a statically-typed context.

Of course, you can't eliminate *all* run-time type checking. I already
mentioned unmarshalling data from an untyped source; another possibility
is run-time code compilation (highly dubious in a production system but
of value in a development system).

However, that's some very specialized applications, easily catered for
by doing a dynamic type check plus a thrown exception in case the types
don't match. I still don't see a convincing argument for making dynamic
typing the standard policy.

Regards,
Jo

Raffael Cavallaro

2006-06-16 15:49:23 UTC

Permalink

And this is a typical dynamic type advocate's response when told that
"*I* don't see the usefulness of static typing so *you* shouldn't want
it, either."

But I haven't made this sort of argument. I never said you shouldn't
use static typing if you want to. There are indeed types of software
where one wants the guarantees provided by static type checks. For
example, software that controls irreplaceable or very expensive
equipment such as space craft, or software that can kill people if it
fails such as software for aircraft or medical devices. The problem for
static typing advocates is that most software is not of this type.

There is a very large class of software where user inputs are
unpredictable and/or where input data comes from an untrusted source.
In these cases run-time checks are going to be needed anyway so the
advantages of static type checking are greatly reduced - you end up
doing run-time checks anyway, precisely the thing you were trying to
avoid by doing static analysis. In software like this it isn't worth
satisfying a static type checker because you don't get much of the
benefit anyway and it means forgoing such advantages of dynamic typing
as being able to run and test portions of a program before other parts
are written (forward references to as yet nonexistent functions).

Ideally one wants a language with switchable typing - static where
possible and necessary, dynamic elsewhere. To a certain extent this is
what common lisp does but it requires programmer declarations. Some
implementations try to move beyond this by doing type inference and
alerting the programmer to potential static guarantees that the
programmer could make that would allow the compiler to do a better job.

In effect the argument comes down to which kind of typing one thinks
should be the default. Dynamic typing advocates think that static
typing is the wrong default. The notion that static typing can prove
program correctness is flawed - it can only prove that type constraints
are not violated but not necessarily that program logic is correct. It
seems to me that if we set aside that class of software where safety is
paramount - mostly embedded software such as aircraft and medical
devices - we are left mostly with efficiency concerns. The 80-20 rule
suggests that most code doesn't really need the efficiency provided by
static guarantees. So static typing should be invoked for that small
portion of a program where efficiency is really needed and that dynamic
typing should be the default elswhere. This is how common lisp works -
dynamic typing by default with static guarantees available where one
needs them.

Torben Ægidius Mogensen

2006-06-19 08:19:05 UTC

Permalink

I think you are confusing static typing with having to write types
everywhere. With type inference, you only have to write a minimum of
type information (such as datatype declarations), so you can easily do
explorative progrmming in such languages -- I don't see any advantage
of dynamic typing in this respect.

The
lisp programming style is arguably all about using heterogenous lists
and forward references in the repl for everything until you know what
it is that you are doing, then choosing a more appropriate
representation and filling in forward references once the program
gels. Having to choose representation right from the start and needing
working versions (even if only stubs) of *every* function you call may
ensure type correctness, but many programmers find that it also
ensures that you never find the right program to code in the first
place.

If you don't have definitions (stubs or complete) of the functions you
use in your code, you can only run it up to the point where you call
an undefined function. So you can't really do much exploration until
you have some definitions.

I expect a lot of the exploration you do with incomplete programs
amount to the feedback you get from type inference.

This is because you don't have the freedom to explore possible
solutions without having to break your concentration to satisfy the
nagging of a static type checker.

I tend to disagree. I have programmed a great deal in Lisp, Scheme,
Prolog (all dynamically typed) and SML and Haskell (both statically
typed). And I don't find that I need more stubs etc. in the latter.
In fact, I do a lot of explorative programming when writing programs
in ML and Haskell. And I find type inference very helpful in this, as
it guides the direction of the exploration, so it is more like a
safari with a local guide than a blindfolded walk in the jungle.

Torben

George Neuner

2006-06-19 17:33:23 UTC

Permalink

On 19 Jun 2006 13:53:01 +0200, torbenm at app-1.diku.dk (Torben ?gidius

On 19 Jun 2006 10:19:05 +0200, torbenm at app-3.diku.dk (Torben ?gidius

Post by Torben Ãgidius Mogensen
I expect a lot of the exploration you do with incomplete programs
amount to the feedback you get from type inference.

The ability to write functions and test them immediately without
writing a lot of supporting code is _far_ more useful to me than type
inference.

I can't see what this has to do with static/dynamic typing. You can
test individul functions in isolation is statically typed languages
too.

It has nothing to do with static/dynamic typing and that was the point
... that support for exploratory programming is orthogonal to the
language's typing scheme.

George
--
for email reply remove "/" from address

Joachim Durchholz

2006-06-16 21:59:07 UTC

Permalink

Post by Raffael Cavallaro
There is a very large class of software where user inputs are
unpredictable and/or where input data comes from an untrusted source. In
these cases run-time checks are going to be needed anyway so the
advantages of static type checking are greatly reduced - you end up
doing run-time checks anyway, precisely the thing you were trying to
avoid by doing static analysis.

There's still a large class of errors that *can* be excluded via type
checking.

Post by Raffael Cavallaro
Ideally one wants a language with switchable typing - static where
possible and necessary, dynamic elsewhere.

That has been my position for a long time now.

Post by Raffael Cavallaro
To a certain extent this is
what common lisp does but it requires programmer declarations. Some
implementations try to move beyond this by doing type inference and
alerting the programmer to potential static guarantees that the
programmer could make that would allow the compiler to do a better job.

I think it's easier to start with a good (!) statically-typed language
and relax the checking, than to start with a dynamically-typed one and
add static checks.
With the right restrictions, a language can make all kinds of strong
guarantees, and it can make it easy to construct software where static
guarantees abound. If the mechanisms are cleverly chosen, they interfere
just minimally with the programming process. (A classical example it
Hindley-Milner type inference systems. Typical reports from languages
with HM systems say that you can have it verify thousand-line programs
without a single type annotation in the code. That's actually far better
than you'd need - you'd *want* to document the types at least on the
major internal interfaces after all *grin*.)
With a dynamically-typed language, programming style tends to evolve in
directions that make it harder to give static guarantees.

Post by Raffael Cavallaro
It seems to
me that if we set aside that class of software where safety is paramount
- mostly embedded software such as aircraft and medical devices - we are
left mostly with efficiency concerns.

Nope. Efficiency has taken a back seat. Software is getting slower
(barely offset by increasing machine speed), and newer languages even
don't statically typecheck everything (C++, Java). (Note that the
impossibility to statically typecheck everything in OO languages doesn't
mean that it's impossible to do rigorous static checking in general.
FPLs have been quite rigorous about static checks; the only cases when
an FPL needs to dynamically typecheck its data structures is after
unmarshalling one from an untyped data source such as a network stream,
a file, or an IPC interface.)

The prime factor nowadays seems to be maintainability.

And the difference here is this:
With dynamic typing, I have to rely on the discipline of the programmers
to document interfaces.
With static typing, the compiler will infer (and possibly document) at
least part of their semantics (namely the types).

Post by Raffael Cavallaro
So static typing should be invoked for that small portion of a program
where efficiency is really needed and that dynamic typing should be the
default elswhere. This is how common lisp works - dynamic typing by
default with static guarantees available where one needs them.

Actually static typing seems to become more powerful at finding errors
as the program size increases.
(Yes, that's a maintainability argument. Efficiency isn't *that*
important; since maintenance is usually the most important single
factor, squelching bugs even before testing is definitely helpful.)

Regards,
Jo

Raffael Cavallaro

2006-06-16 22:25:09 UTC

Permalink

Post by Joachim Durchholz
I think it's easier to start with a good (!) statically-typed language
and relax the checking, than to start with a dynamically-typed one and
add static checks.
With the right restrictions, a language can make all kinds of strong
guarantees, and it can make it easy to construct software where static
guarantees abound. If the mechanisms are cleverly chosen, they
interfere just minimally with the programming process. (A classical
example it Hindley-Milner type inference systems. Typical reports from
languages with HM systems say that you can have it verify thousand-line
programs without a single type annotation in the code. That's actually
far better than you'd need - you'd *want* to document the types at
least on the major internal interfaces after all *grin*.)
With a dynamically-typed language, programming style tends to evolve in
directions that make it harder to give static guarantees.

This is purely a matter of programming style. For explorative
programming it is easier to start with dynamic typing and add static
guarantees later rather than having to make decisions about
representation and have stubs for everything right from the start. The
lisp programming style is arguably all about using heterogenous lists
and forward references in the repl for everything until you know what
it is that you are doing, then choosing a more appropriate
representation and filling in forward references once the program gels.
Having to choose representation right from the start and needing
working versions (even if only stubs) of *every* function you call may
ensure type correctness, but many programmers find that it also ensures
that you never find the right program to code in the first place. This
is because you don't have the freedom to explore possible solutions
without having to break your concentration to satisfy the nagging of a
static type checker.

Darren New

2006-06-16 16:59:41 UTC

Permalink

Give a heterogenous list that would to too awkward to live in a
statically-typed language.

Printf()?

--
Darren New / San Diego, CA, USA (PST)
My Bath Fu is strong, as I have
studied under the Showerin' Monks.

Matthias Blume

2006-06-16 20:45:26 UTC

Permalink

Very good statically typed versions of printf exist. See, e.g.,
Danvy's unparsing combinators.

That seems to ignore the fact that the pattern is a string, which
means that printf's first argument in Danvy's mechanism has to be a
literal.

In Danvy's solution, the format argument is not a string.

You can't read the printf format from a configuration file
(for example) to support separate languages.

You don't need to do that if you want to support separate languages.
Moreover, reading the format string from external input is a good way
of opening your program to security attacks, since ill-formed data on
external media are then able to crash you program.

It doesn't look like the
version of printf that can print its arguments in an order different
from the order provided in the argument list is supported either;
something like "%3$d" or some such.

I am not familiar with the version of printf you are refering to, but
I am sure one could adapt Danvy's solution to support such a thing.

Second, what's the type of the argument that printf, sprintf, fprintf,
kprintf, etc all pass to the subroutine that actually does the
formatting? (Called vprintf, I think?)

Obviously, a Danvy-style solution (see, e.g., the one in SML/NJ's
library) is not necessarily structured that way. I don't see the
problem with typing, though.

Matthias

Darren New

2006-06-16 16:45:46 UTC

Permalink

Give a heterogenous list that would to too awkward to live in a
statically-typed language.

Write a function that takes an arbitrary set of arguments and stores
them into a structure allocated on the heap.

Give a case of calling nonexistent functions that's useful.

See the Tcl "unknown" proc, used for interactive command expansion,
dynamic loading of code on demand, etc.

--
Darren New / San Diego, CA, USA (PST)
My Bath Fu is strong, as I have
studied under the Showerin' Monks.

Sacha

2006-06-16 10:10:17 UTC

Permalink

"Joachim Durchholz" <jo at durchholz.org> wrote in message

Um... heterogenous lists are not necessarily a sign of expressiveness.
The vast majority of cases can be transformed to homogenous lists
(though these might then contain closures or OO objects).
As to references to nonexistent functions - heck, I never missed these,
not even in languages without type inference :-)
[[snipped - doesn't seem to relate to your answer]]

Give a heterogenous list that would to too awkward to live in a
statically-typed language.

Many lists are heterogenous, even in statically typed languages.
For instance lisp code are lists, with several kinds of atoms and
sub-lists..
A car dealer will sell cars, trucks and equipment..
In a statically typed language you would need to type the list on a common
ancestor...
What would then be the point of statical typing , as you stilll need to type
check
each element in order to process that list ? Sure you can do this in a
statically-typed
language, you just need to make sure some relevant ancestor exists. In my
experience
you'll end up with the base object-class more often than not, and that's
what i call
dynamic typing.

Give a case of calling nonexistent functions that's useful.

I might want to test some other parts of my program before writing this
function.
Or maybe will my program compile that function depending on user input.
As long as i get a warning for calling a non-existing function, everything
is fine.

Sacha

Joachim Durchholz

2006-06-21 11:24:39 UTC

Permalink

(It's really important to understand that the idea is to use this for
deployed programs - albeit hopefully in a more structured fashion - and
not only for debugging. The example I have given is an extreme one that
you would probably not use as such in a "real-world" setting, but it
shows that there is a boundary beyond which static type systems cannot
be used in a meaningful way anymore, at least as far as I can tell.)

As soon as the running program can be updated, the distinction between
"static" (compile time) and "dynamic" (run time) blurs.
You can still erect a definition for such a case, but it needs to refer
to the update process, and hence becomes language-specific. In other
words, language-independent definitions of dynamic and static typing
won't give any meaningful results for such languages.

I'd say it makes more sense to talk about what advantages of static vs.
dynamic typing can be applied in such a situation.
E.g. one interesting topic would be the change in trade-offs: making
sure that a type error cannot occur becomes much more difficult
(particularly if the set of available types can change during an
update), so static typing starts to lose some of its appeal; OTOH a good
type system can give you a lot of guarantees even in such a situation,
even if it might have to revert to the occasional run-time type check,
so static checking still has its merits.

Regards,
Jo

Pascal Costanza

2006-06-26 17:41:06 UTC

Permalink

What about this: You get a type error when the program attempts to
invoke an operation on values that are not appropriate for this operation.
Examples: adding numbers to strings; determining the string-length of a
number; applying a function on the wrong number of parameters; applying
a non-function; accessing an array with out-of-bound indexes; etc.

This makes essentially all run-time errors (including assertion failures,
etc.) "type errors". It is neither consistent with, nor any improvement
on, the existing vaguely defined usage.

Nope. This is again a matter of getting the levels right.

Consider division by zero: appropriate arguments for division are
numbers, including the zero. The dynamic type check will typically not
check whether the second argument is zero, but will count on the fact
that the processor will raise an exception one level deeper.

This is maybe better understandable in user-level code. Consider the
following class definition:

class Person {
String name;
int age;

void buyPorn() {
if (< this.age 18) throw new AgeRestrictionException();
...
}
}

The message p.buyPorn() is a perfectly valid message send and will pass
both static and dynamic type tests (given p is of type Person in the
static case). However, you will still get a runtime error.

Pascal

--
3rd European Lisp Workshop
July 3 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

Chris Smith

2006-06-26 17:49:51 UTC

Permalink

Consider division by zero: appropriate arguments for division are
numbers, including the zero. The dynamic type check will typically not
check whether the second argument is zero, but will count on the fact
that the processor will raise an exception one level deeper.

It appears you've written the code above to assume that the type system
can't cerify that age >= 18... otherwise, the if statement would not
make sense. It also looks like Java, in which the type system is indeed
not powerfule enough to do that check statically. However, it sounds as
if you're claiming that it wouldn't be possible for the type system to
do this? If so, that's not correct. If such a thing were checked at
compile-time by a static type check, then failing to actually provide
that guarantee would be a type error, and the compiler would tell you
so.

Whether you'd choose to call an equivalent runtime check a "dynamic type
check" is a different matter, and highlights the fact that again there's
something fundamentally different about the definition of a "type" in
the static and dynamic sense.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Rob Thorpe

2006-06-19 08:48:10 UTC

Permalink

That's the point: Bugs that in dynamically typed languages would
require testing to find are found by the compiler in a statically
typed language. So whil eit may take onger to get a program thatgets
past the compiler, it takes less time to get a program that works.

In my experience the opposite is true for many programs.
Having to actually know the precise type of every variable while
writing the program is not necessary, it's a detail often not relevant
to the core problem. The language should be able to take care of
itself.
In complex routines it can be useful for the programmer to give types
and for the compiler to issue errors when they are contradicted. But
for most routines it's just an unnecessary chore that the compiler
forces on the programmer.

Indeed. So use a language with type inference.

Well, for most purposes that's the same as dynamic typing since the
compiler doesn't require you to label the type of your variables. I
occasionally use CMUCL and SBCL which do type inference, which is
useful at improving generated code quality. It also can warn the
programmer if they if they reuse a variable in a context implying that
it's a different type which is useful.

I see type inference as an optimization of dynamic typing rather than a
generalization of static typing. But I suppose you can see it that way
around.

Chris F Clark

2006-06-26 22:13:16 UTC

Permalink

These informal systems, which may not prove what they claim to prove
are my concept of a "type system".
Okay, that works. I'm not sure where it gets us, though....

Ok, we'll get there. First, we need to step back in time, to when there
was roughly algol, cobol, fortran, and lisp. Back then, at least as I
understood things, there were only a few types that generally people
understood integer, real, and (perhaps) pointer. Now, with algol or
fortran things were generally only of the first two types and
variables were statically declared to be one or the other. With lisp
any cell could hold any one of the 3, and some clever implementor
added "tag bits" to distinguish which the cell held. As I understood
it, the tag bits weren't originally for type correctness, so much as
they facilitated garbage collection. The garbage collector didn't
know what type of data you put in a cell and had to determine which
cells contained pointers, so that the mark-and-sweep algorithms could
sweep by following the pointers (and not following the integers that
looked like pointers). Still, these tag bits did preserve the
"dynamic" type, in the sense that we knew types from the other
languages. As I remember it, sophistication with type really didn't
occur as a phenomena for the general programmer until the introduction
of Pascal (and to a lesser extent PL/I). Moreover, as I recall it,
perhaps because I didn't learn the appropriate dialect was that lisp
dialects kept with the more general notion of type (lists and tuples)
and eschewed low-level bit-fiddling where we might want to talk about
a 4 bit integer representing 0 .. 15 or -8 .. 7.

The important thing is the dynamicism of lisp allowed one to write
polymorphic programs, before most of us knew the term. However, we
still had a notion of type: integers and floating point numbers were
still different and one could not portably use integer operations on
floating pointer values or vice versa. However, one could check the
tag and "do the right thing" and many lisp primitives did just that,
making them polymorphic.

The way most of us conceived (or were taught to conceive) of the
situation was that there still were types, they were just associated
with values and the type laws we all knew and loved still worked, they
just worked dynamically upon the types of the operand values that were
presented at the time.

Can this be made rigorous? Perhaps.

Instead of viewing the text of the program staticly, let us view it
dynamicly, that is by introducing a time (or execution) dimension.
This is easier if you take an imperative view of the dynamic program
and imagine things having an operational semantics, which is why we
stepped back in time in the first place, so that we are in a world
where imperative programming is the default model for most
programmers.

Thus, as we traverse a list, the first element might be an integer,
the second a floating point value, the third a sub-list, the fourth
and fifth, two more integers, and so on. If you look statically at
the head of the list, we have a very wide union of types going by.
However, perhaps there is a mapping going on that can be discerned.
For example, perhaps the list has 3 distinct types of elements
(integers, floating points, and sub-lists) and it circles through the
types in the order, first having one of each type, then two of each
type, then four of each type, then eight, and so on. The world is now
patterned.

However, I don't know how to write a static type annotation that
describes exactly that type. That may just be my lack of experience
in writing annotations. However, it's well within my grasp to imagine
the appropriate type structure, even though **I** can't describe it
formally. More importantly, I can easily write something which checks
the tags and sees whether the data corresponds to my model.

And, this brings us to the point, if the data that my program
encounters doesn't match the model I have formulated, then something
is of the wrong "type". Equally importantly, the dynamic tags, have
helped me discover that type error.

Now, the example may have seemed arbitrary to you, and it was in some
sense arbitrary. However, if one imagines a complete binary tree with
three kinds elements stored in memory as rows per depth, one can get
exactly the structure I described. And, if one were rewriting that
unusual representation to a more normal one, one might want exactly
the "type check" I proposed to validate that the input binary tree was
actually well formed.

Does this help explain dynamic typing as a form of typing?

-Chris

Marshall

2006-06-21 15:09:06 UTC

Permalink

On a semantic level, the tag is always there - it's the type (and
definitely part of an axiomatic definition of the language).
Tag elimination is "just" an optimization.

I see what you're saying, but the distinction is a bit fine for me.
If the language has no possible mechanism to observe the
it-was-only-eliminated-as-an-optimization tag, in what sense
is it "always there?" E.g. The 'C' programming language.

Marshall

Chris F Clark

2006-06-25 01:01:36 UTC

Permalink

Unfortunately, I have to again reject this idea. There is no such
restriction on type theory. Rather, the word type is defined by type
theorists to mean the things that they talk about.

Do you reject that there could be something more general than what a
type theorist discusses? Or do you reject calling such things a type?

because we could say that anything that checks types is a type system,
and then worry about verifying that it's a sound type system without
worrying about whether it's a subset of the perfect type system.

I'm particularly interested if something unsound (and perhaps
ambiguous) could be called a type system. I definitely consider such
things type systems. However, I like my definitions very general and
vague. Your writing suggests the opposite preference. To me if
something works in an analogous way to how a known type system, I tend
to consider it a "type system". That probably isn't going to be at
all satisfactory to someone wanting a more rigorous definition. Of
course, to my mind, the rigorous definitions are just an attempt to
capture something that is already known informally and put it on a
more rational foundation.

So what is the domain of a function? (Heck, what is a function?

...

(I need a word here that implies something like a set, but I don't care
to verify the axioms of set theory. I'm just going to use set. Hope
that's okay.)

Yes, I meant the typical HS algebra definition of domain, which is a
set, same thing for function. More rigorous definitions can be
sustituted as necessary and appropriate.

1. The domain is the set of inputs to that expression which are going to
produce a correct result.
2. The domain is the set of inputs that I expected this expression to
work with when I wrote it.
3. The domain is the set of inputs for which the expression has a
defined result within the language semantics.
So the problem, then, more clearly stated, is that we need something
stronger than #3 and weaker than #1, but #2 includes some psychological
nature that is problematic to me (though, admittedly, FAR less
problematic than the broader uses of psychology to date.)

Actually, I like 2 quite well. There is some set in my mind when I'm
writing a particular expression. It is likely an ill-defined set, but
I don't notice that. That set is exactly the "type". I approximate
that set and it's relationships to other sets in my program with the
typing machinery of the language I am using. That is the static (and
well-founded) type. It is however, only an approximation to the ideal
"real" type. If I could make that imaginary mental set concrete (and
well-defined), that would be exactly my concept of type.

Now, that overplays the conceptual power in my mind. My mental image
is influenced by my knowledge of the semantics of the language and
also any implementations I may have used. Thus, I will weaken 2 to be
closer to 3, because I don't expect a perfectly ideal type system,
just an approximation. To the extent that I am aware of gotchas in
the language or a relavent implementation, I amy write extra code to
try and limit things to model 1. Note that in my fallible mind, 1 and
2 are identical. In my hubris, I expect that the extra checks I have
made have restricted my inputs to where model 3 and 1 coincide.

Expanding that a little. I expect the language to catch type errors
where I violate model 3. To the extent model 3 differs from model 1
and my code hasn't added extra checks to catch it, I have bugs
resulting from undetected type errors or perhaps modelling errors. To
me they are type errors, because I expect that there is a typing
system that captures 1 for the program I am writing, even though I
know that that is not generally true.

As I reflect on this more, I really do like 2 in the sense that I
believe there is some abstract Platonic set that is an ideal type
(like 1) and that is what the type system is approximating. to the
sense that languages approximate either 1 or 2, those facilites are
type facilities of a language. That puts me very close to your
(rejected) definition of type as the well-defined semantics of the
language. Except I think of types as the sets of values that the
language provides, so it isn't the entire semantics of the language,
just that part which divides values into discrete sets which are
approriate to different operations (and detect inaapropriate values).

-Chris

Anton van Straaten

2006-06-26 06:20:20 UTC

Permalink

What makes static type systems interesting is not the fact that these
logical processes of reasoning exist; it is the fact that they are
formalized with definite axioms and rules of inference, performed
entirely on the program before execution, and designed to be entirely
evaluatable in finite time bounds according to some procedure or set of
rules, so that a type system may be checked and the same conclusion
reached regardless of who (or what) is doing the checking. All that,
and they still reach interesting conclusions about programs.

There's only one issue mentioned there that needs to be negotiated
w.r.t. latent types: the requirement that they are "performed entirely
on the program before execution". More below.

If
informal reasoning about types doesn't meet these criteria (and it
doesn't), then it simply doesn't make sense to call it a type system
(I'm using the static terminology here). It may make sense to discuss
some of the ways that programmer reasoning resembles types, if indeed
there are resemblances beyond just that they use the same basic rules of
logic. It may even make sense to try to identify a subset of programmer
reasoning that even more resembles... or perhaps even is... a type
system. It still doesn't make sense to call programmer reasoning in
general a type system.

I'm not trying to call programmer reasoning in general a type system.
I'd certainly agree that a programmer muddling his way through the
development of a program, perhaps using a debugger to find all his
problems empirically, may not be reasoning about anything that's
sufficiently close to types to call them that. But "latent" means what
it implies: someone who is more aware of types can do better at
developing the latent types.

As a starting point, let's assume we're dealing with a disciplined
programmer who (following the instructions found in books such as the
one at htdp.org) reasons about types, writes them in his comments, and
perhaps includes assertions (or contracts in the sense of "Contracts for
Higher Order Functions[1]), to help check his types at runtime (and I'm
talking about real type-checking, not just tag checking).

When such a programmer writes a type signature as a comment associated
with a function definition, in many cases he's making a claim that's
provable in principle about the inputs and outputs of that function.

For example, if the type in question is "int -> int", then the provable
claim is that given an int, the function cannot return anything other
than an int. Voila, assuming we can actually prove our claim, then
we've satisfied the requirement in Pierce's definition of type system,
i.e. "proving the absence of certain program behaviors by classifying
phrases according to the kinds of values they compute."

The fact that the proof in question may not be automatically verified is
no more relevant than the fact that so many proofs in mathematics have
not been automatically verified. Besides, if the proof turns out to be
wrong, we at least have an automated mechanism for finding witnesses to
the error: runtime tag checks will generate an error. Such detection is
not guaranteed, but again, "proof" does not imply "automation".

What I've just described walks like a type and talks like a type, or at
least it appears to do so according to Pierce's definition. We can
classify many terms in a given dynamically-typed program on this basis
(although some languages may be better at supporting this than others).

So, on what grounds do you reject this as not being an example of a
type, or at the very least, something which has clear and obvious
connections to formal types, as opposed to simply being arbitrary
programmer reasoning?

Is it the lack of a formalized type system, perhaps? I assume you
recognize that it's not difficult to define such a system.

Regarding errors, we haven't proved that the function in question can
never be called with something other than an int, but we haven't claimed
to prove that, so there's no problem there. I've already described how
errors in our proofs can be detected.

Another possible objection is that I'm talking about local cases, and
not making any claims about extending it to an entire program (other
than to observe that it can be done). But let's take this a step at a
time: considered in isolation, if we can assign types to terms at the
local level in a program, do you agree that these are really types, in
the type theory sense?

If you were to agree, then we could move on to looking at what it means
to have many local situations in which we have fairly well-defined
types, which may all be defined within a common type system.

As an initial next step, we could simply rely on the good old universal
type everywhere else in the program - it ought to be possible to make
the program well-typed in that case, it would just mean that the
provable properties would be nowhere near as pervasive as with a
traditional statically-typed program. But the point is we would now
have put our types into a formal context.

In the same post here, you simultaneously suppose that there's something
inherently informal about the type reasoning in dynamic languages; and
then talk about the type system "in the static sense" of a dynamic
language. How is this possibly consistent?

The point about inherent informality is just that if you fully formalize
a static type system for a dynamic language, such that entire programs
can be statically typed, you're likely to end up with a static type
system with the same disadvantages that the dynamic language was trying
to avoid.

However, that doesn't preclude type system being defined which can be
used to reason locally about dynamic programs. Some people actually do
this, albeit informally.

So we actually have quite a bit of evidence about the presence of static
types in dynamically-typed programs.

No. What we have is quite a bit of evidence about properties remaining
true in dynamically typed programs, which could have been verified by
static typing.

Using my example above, at least some of those properties would have
been verified by manual static typing. So those properties, at least,
seem to be types, at least w.r.t. the local fragment they're proved within.

We're currently discussing something that so far has only been captured
fairly informally. If we restrict ourselves to only what we can say
about it formally, then the conversation was over before it began.

I agree with that statement. However, I think the conversation
regarding static typing is also over when you decide that the answer is
to weaken static typing until the word applies to informal reasoning.
If the goal isn't to reach a formal understanding, then the word "static
type" shouldn't be used

Well, I'm trying to use the term "latent type", as a way to avoid the
ambiguity of "type" alone, to distinguish it from "static type", and to
get away from the obvious problems with "dynamic type".

But I'm much less interested in the term itself than in the issues
surrounding dealing with "real" types in dynamically-checked programs -
if someone had a better term, I'd be all in favor of it.

and when that is the goal, it still shouldn't
be applied to process that aren't formalized until they manage to become
formalized.

This comes back to the phrase I highlighted at the beginning of this
message: "performed entirely on the program before execution".
Formalizing local reasoning about types is pretty trivial, in many
cases. It's just that there are other cases where that's less
straightforward.

So when well-typed program fragments are considered as part of a larger
untyped program, you're suggesting that this so radically changes the
picture, that we can no longer distinguish the types we identified as
being anything beyond programmer reasoning in general? All the hard
work we did figuring out the types, writing them down, writing
assertions for them, and testing the program to look for violations
evaporates into the epistemological black hole that exists outside the
boundaries of formal type theory?

Hopefully, this isn't perceived as too picky. I've already conceded
that we can use "type" in a way that's incompatible with all existing
research literature.

Not *all* research literature. There's literature on various notions of
dynamic type.

I would, however, like to retain some word with
actually has that meaning. Otherwise, we are just going to be
linguistically prevented from discussing it.

For now, you're probably stuck with "static type". But I think it was
never likely to be the case that type theory would succeed in absconding
with the only technically valid use of the English word "type".
Although not for want of trying!

Besides, I think it's worth considering why Bertrand Russell used the
term "types" in the first place. His work had deliberate connections to
the real world. Type theory in computer science unavoidably has similar
connections. You could switch to calling it Zoltar Theory, and you'd
still have to deal with the fact that a zoltar would have numerous
connections to the English word "type". In CS, we don't have the luxury
of using the classic mathematician's excuse when confronted with
inconvenient philosophical issues, of claiming that type theory is just
a formal symbolic game, with no meaning outside the formalism.

Anton

[1]
http://people.cs.uchicago.edu/~robby/pubs/papers/ho-contracts-techreport.pdf

Chris Smith

2006-06-26 00:16:08 UTC

Permalink

These informal systems, which may not prove what they claim to prove
are my concept of a "type system".

Okay, that works. I'm not sure where it gets us, though, except for
gaining you the right to use "type system" in a completely uninteresting
sense to describe your mental processes. You still need to be careful,
since most statements from type theory about type systems tend to
implicitly assume the property of being sound.

I'm not exactly convinced that it's possible to separate that set of
reasoning people do about programs into types and not-types, though. If
you want to stretch the definition like this, then I'm afraid that you
end up with the entire realm of human thought counting as a type system.
I seem to recall that there are epistemologists who would be happy with
that conclusion (I don't recall who, and I'm not actually interested
enough to seek them out), but it doesn't lead to a very helpful sort of
characterization.

It is to the people who are reasoning informally about
the program we wish to communicate that there is a type error. That
is, we want someone who is dealing with the program informally to
realize that there is an error, and that this error is somehow related
to some input not being kept within proper bounds (inside the domain
set) and that as a result the program will compute an unexpected and
incorrect result.

I've lost the context for this part of the thread. We want this for
what purpose and in what situation?

I don't think we necessarily want this as the goal of a dynamic type
system, if that's what you mean. There is always a large (infinite? I
think so...) list of potential type systems under which the error would
not have been a type error. How does the dynamic type system know which
informally defined, potentially unsound, mental type system to check
against? If it doesn't know, then it can't possibly know that something
is a type error. If it does know, then the mental type system must be
formally defined (if nothing else, operationally by the type checker.)

I also stress the informality, because beyond a certain nearly trivial
level of complexity, people are not capable of dealing with truly
formal systems.

I think (hope, anyway) that you overstate the case here. We can deal
with formal systems. We simply make errors. When those errors are
found, they are corrected. Nevertheless, I don't suspect that a large
part of the established base of theorems of modern mathematics are
false, simply because they are sometimes rather involved reasoning in a
complex system.

Specifications are another thing, because they are often developed under
time pressure and then extremely difficult to change. I agree that we
should write better specs, but I think the main challenges are political
and pragmatic, rather than fundamental to human understanding.

Therefore, I do not want to exlcude from type systems, things wich are
informal and unsound. They are simply artifacts of human creation.

Certainly, though, when it is realized that a type system is unsound, it
should be fixed as soon as possible. Otherwise, the type system doesn't
do much good. It's also true, I suppose, that as soon as a person
realizes that their mental thought process of types is unsound, they
would want to fix it. The difference, though, is that there is then no
formal definition to fix.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Joachim Durchholz

2006-06-21 14:15:45 UTC

Permalink

I think Marshall got this one right. The two are accomplishing
different things. In one case (the dynamic case) I am safeguarding
against negative consequences of the program behaving in certain non-
sensical ways. In the other (the static case) I am proving theorems
about the impossibility of this non-sensical behavior ever happening.

And so conflating the two notions of type (-checking) as a kind of category
error ? If so then I see what you mean, and it's a useful distinction, but am
unconvinced that it's /so/ helpful a perspective that I would want to exclude
other perspectives which /do/ see the two as more-or-less trivial variants on
the same underlying idea.

It is indeed helpful.
Just think of all the unit tests that you don't have to write.

Regards,
Jo

Andrew McDonagh

2006-06-25 18:57:51 UTC

Permalink

I haven't read all of this thread, I wonder, is the problem to do with
Class being mistaken for Type? (which is usually the issue)

Hi Andrew!

Hi Chris

Not much of this thread has to do with object oriented languages... so
the word "class" would be a little out of place.

Glad to here.

However, it is true
that the word "type" is being used in the dynamically typed sense to
include classes from class-based OO languages (at least those that
include run-time type features), as well as similar concepts in other
languages. Some of us are asking for, and attempting to find, a formal
definition to justify this concept, and are so far not finding it.
Others are saying that the definition is somehow implicitly
psychological in nature, and therefore not amenable to formal
definition... which others (including myself) find rather unacceptable.
I started out insisting that "type" be used with its correct formal
definition, but I'm convinced that was a mistake. Asking someone to
change their entire terminology is unlikely to succeed. I'm now
focusing on just trying to represent the correct formal definition of
types in the first place, and make it clear when one or the other
meaning is being used.
Hopefully, that's a fair summary of the thread to date.

Cheers much appreciated!

Andrew

David Hopwood

2006-06-22 21:35:35 UTC

Permalink

As far as I can tell, the people who advocate using "typed" and "untyped"
in this way are people who just want to be able to discuss all languages in
a unified terminological framework, and many of them are specifically not
advocates of statically typed languages.

Its easy to create a reasonable framework. My earlier posts show simple
ways of looking at it that could be further refined, I'm sure there are
others who have already done this.
The real objection to this was that latently/dynamically typed
languages have a place in it.

You seem to very keen to attribute motives to people that are not apparent
from what they have said.

But some of the advocates of statically
typed languages wish to lump these languages together with assembly
language a "untyped" in an attempt to label them as unsafe.

A common term for languages which have defined behaviour at run-time is
"memory safe". For example, "Smalltalk is untyped and memory safe."
That's not too objectionable, is it?

(It is actually more common for statically typed languages to fail to be
memory safe; consider C and C++, for example.)

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Yet Another Dan

2006-06-19 18:21:56 UTC

Permalink

Chris Smith <cdsmith at twu.net> wrote in

A language is latently typed if a value has a property - called it's
type - attached to it, and given it's type it can only represent
values defined by a certain class.

Now I define a class of values called "correct" values. I define
these to be those values for which my program will produce acceptable
results. Clearly there is a defined class of such values: (1) they
are immediately defined by the program's specification for those lines
of code that produce output; ...
I'm not trying to poke holes in your definition for fun. I am
proposing that there is no fundamental distinction between the kinds
of problems that are "type problems" and those that are not.

That sounds like a lot to demand of a type system. It almost sounds like
it's supposed to test and debug the whole program. In general, defining the
exact set of values for a given variable that generate acceptable output
from your program will require detailed knowledge of the program and all
its possible inputs. That goes beyond simple typing. It sounds more like
contracts. Requiring an array index to be an integer is considered a typing
problem because it can be checked based on only the variable itself,
whereas checking whether it's in bounds requires knowledge about the array.

--
YAD

Rob Thorpe

2006-06-23 17:20:39 UTC

Permalink

Its easy to create a reasonable framework.

Luca Cardelli has given the most convincing one in his seminal tutorial
"Type Systems", where he identifies "typed" and "safe" as two orthogonal
| typed | untyped
-------+-------+----------
safe | ML | Lisp
unsafe | C | Assembler
Now, jargon "dynamically typed" is simply untyped safe, while "weakly
typed" is typed unsafe.

Consider a langauge something like BCPL or a fancy assembler, but with
not quite a 1:1 mapping with machine langauge.

It differs in one key regard: it has a variable declaration command.
This command allows the programmer to allocate a block of memory to a
variable. If the programmer attempts to index a variable outside the
block of memory allocated to it an error will occur. Similarly if the
programmer attempts to copy a larger block into a smaller block an
error would occur.

Such a language would be truly untyped and "safe", that is safe
according to many peoples use of the word including, I think, yours.

But it differs from latently typed languages like python, perl or lisp.
In such a language there is no information about the type the variable
stores. The programmer cannot write code to test it, and so can't
write functions that issue errors if given arguments of the wrong type.
The programmer must use his or her memory to substitute for that
facility. As far as I can see this is a significant distinction and
warrants a further category for latently typed languages.

As a sidenote: the programmer may also create a tagging system to allow
the types to be tracked. But this is not the same as saying the
language supports types, it is akin to the writing of OO programs in C
using structs.

Darren New

2006-06-21 17:43:37 UTC

Permalink

AFAICT, ADT describes a type whose values can only be accessed by a
certain fixed set of operations.

No. AFAIU, an ADT defines the type based on the operations. The stack
holding the integers 1 and 2 is the value (push(2, push(1, empty()))).
There's no "internal" representation. The values and operations are
defined by preconditions and postconditions.

Both a stack and a queue could be written in most languages as "values
that can only be accessed by a fixed set of operations" having the same
possible internal representations and the same function signatures.
They're far from the same type, because they're not abstract. The part
you can't see from outside the implementation is exactly the part that
defines how it works.

Granted, it's a common mistake to write that some piece of C++ code
implements a stack ADT.

For example, an actual ADT for a stack (and a set) is shown on
http://www.cs.unc.edu/~stotts/danish/web/userman/umadt.html
Note that the "axia" for the stack type completely define its operation
and semantics. There is no other "implementation" involved. And in LOTOS
(which uses ACT.1 or ACT.ONE (I forget which)) as its type, this is
actually how you'd write the code for a stack, and then the compiler
would crunch a while to figure out a corresponding implementation.

I suspect "ADT" is by now so corrupted that it's useful to use a
different term, such as "algebraic type" or some such.

Classes qualify for that, as long as they provide proper encapsulation.

Nope.

OK, I admit that I exaggerated slightly. Although currently I'm indeed
able to mostly work with the more pleasant among languages. :-)

Yah. :-)

(Btw, Pascal did not have it either, AFAIK)

I'm pretty sure in Pascal you could say

Type Apple = Integer; Orange = Integer;
and then vars of type apple and orange were not interchangable.

--
Darren New / San Diego, CA, USA (PST)
My Bath Fu is strong, as I have
studied under the Showerin' Monks.

David Hopwood

2006-06-26 14:49:12 UTC

Permalink

I agree with Bob Harper about safety being language-specific and all
that. But, with all due respect, I think his characterization of C is
not accurate.

[...]

AFAIC, C is C-unsafe by Bob's reasoning.

Agreed.

Define a state "undefined" that is considered "safe" and add a
transition to "undefined" wherever necessary.

I wouldn't say that was "quite easy" at all.

C99 4 #2:
# If a "shall" or "shall not" requirement that appears outside of a constraint
# is violated, the behavior is undefined. Undefined behavior is otherwise
# indicated in this International Standard by the words "undefined behavior"
# *or by the omission of any explicit definition of behavior*. [...]

In other words, to fix C to be a safe language (compatible with Standard C89
or C99), you first have to resolve all the ambiguities in the standard where
the behaviour is *implicitly* undefined. There are a lot of them.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Joachim Durchholz

2006-06-22 20:01:52 UTC

Permalink

(Btw, Pascal did not have it either, AFAIK)

Indeed.
Some Pascal dialects have it.

Regards,
Jo

Andreas Rossberg

2006-06-21 09:06:01 UTC

Permalink

Oh, but it *does* make sense to talk about dynamic tagging in a statically
typed language.

It even makes perfect sense to talk about dynamic typing in a statically
typed language - but keeping the terminology straight, this rather
refers to something like described in the well-known paper of the same
title (and its numerous follow-ups):

Martin Abadi, Luca Cardelli, Benjamin Pierce, Gordon Plotkin
Dynamic typing in a statically-typed language.
Proc. 16th Symposium on Principles of Programming Languages, 1989
/ TOPLAS 13(2), 1991

Note how this is totally different from simple tagging, because it deals
with real types at runtime.

- Andreas

Marshall

2006-06-19 22:08:38 UTC

Permalink

While I am quite sympathetic to this point, I have to say that
this horse left the barn quite some time ago.

I don't think so. Perhaps it's futile to go scouring the world for uses
of the phrase "dynamic type" and eliminating them. It's not useless to
point out when the term is used in a particularly confusing way, though,
as when it's implied that there is some class of "type errors" that is
strictly a subset of the class of "errors". Terminology is often
confused for historical reasons, but incorrect statements eventually get
corrected.

That's fair.

One thing that is frustrating to me is that I really want to build
an understanding of what dynamic typing is and what its
advantages are, but it's difficult to have a productive discussion
on the static vs. dynamic topic. Late binding of every function
invocation: how does that work, what are the implications of that,
what does that buy you?

I have come to believe that the two actually represent
very different ways of thinking about programming. Which
goes a long way to explaining why there are such difficulties
communicating. Each side tries to describe to the other how
the tools and systems they use facilitate doing tasks that
don't exist in the other side's mental model.

PS. Hi Chris!

Hi! Where are you posting from these days?

I'm mostly on comp.databases.theory, but I also lurk
on comp.lang.functional, which is an awesome group, and
if you're reading Pierce then you might like it too.

Marshall

Joe Marshall

2006-06-22 17:43:42 UTC

Permalink

Assume a language that
a) defines that a program is "type-correct" iff HM inference establishes
that there are no type errors
b) compiles a type-incorrect program anyway, with an establishes
rigorous semantics for such programs (e.g. by throwing exceptions as
appropriate).

So the compiler now attempts to prove theorems about the program, but
once it has done so it uses the results merely to optimize its runtime
behavior and then throws the results away. I'd call that not a
statically typed language, then.

You're assuming that type-correctness is an all-or-nothing property
(well, technically it *is*, but bear with me). What if the compiler is
unable to prove a theorem about the entire program, but *can* prove a
theorem about a subset of the program. The theorems would naturally be
conditional, e.g. `Provided the input is an integer, the program is
type-safe', or time-bounded, e.g. `Until the program attempts to invoke
function FOO, the program is type-safe.'

Of course, we could encode that by restricting the type of the input
and everything would be copacetic, but suppose there is a requirement
that floating point numbers are valid input. For some reason, our
program is not type-safe for floats, but as a developer who is working
on the integer math routines, I have no intention of running that code.
The compiler cannot prove that I won't perversely enter a float, but
it can prove that if I enter an integer everything is type-safe. I can
therefore run, debug, and use a subset of the program.

That's the important point: I want to run broken code. I want to run
as much of the working fragments as I can, and I want a `safety net' to
prevent me from performing undefined operations, but I want the safety
net to catch me at the *last* possible moment. I'm not playing it safe
and staying where the compiler can prove I'll be ok. I'm living
dangerously and wandering near the edge where the compiler can't quite
prove that I'll fail.

Once I've done the major amount of exploratory programming, I may very
well want to tighten things up. I'd like to have the compiler prove
that some mature library is type-safe and that all callers to the
library use it in a type-correct manner. I'd like the compiler to say
`Hey, did you know that if the user enters a floating point number
instead of his name that your program will crash and burn?', but I
don't want that sort of information until I'm pretty sure about what I
want the program to do in the normal case.

Rob Thorpe

2006-06-19 15:44:49 UTC

Permalink

That's not really the difference between static and dynamic typing.
Static typing means that there exist a typing at compile-time that
guarantess against run-time type violations. Dynamic typing means
that such violations are detected at run-time. This is orthogonal to
strong versus weak typing, which is about whether such violations are
detected at all. The archetypal weakly typed language is machine code
-- you can happily load a floating point value from memory, add it to
a string pointer and jump to the resulting value. ML and Scheme are
both strongly typed, but one is statically typed and the other
dynamically typed.

Knowing that it'll cause a lot of strenuous objection, I'll nevertheless
interject my plea not to abuse the word "type" with a phrase like
"dynamically typed". If anyone considers "untyped" to be perjorative,
as some people apparently do, then I'll note that another common term is
"type-free," which is marketing-approved but doesn't carry the
misleading connotations of "dynamically typed." We are quickly losing
any rational meaning whatsoever to the word "type," and that's quite a
shame.

I don't think dynamic typing is that nebulous. I remember this being
discussed elsewhere some time ago, I'll post the same reply I did then
..

A language is statically typed if a variable has a property - called
it's type - attached to it, and given it's type it can only represent
values defined by a certain class.

A language is latently typed if a value has a property - called it's
type - attached to it, and given it's type it can only represent values
defined by a certain class.

Some people use dynamic typing as a word for latent typing, others use
it to mean something slightly different. But for most purposes the
definition above works for dynamic typing also.

Untyped and type-free mean something else: they mean no type checking
is done.

Darren New

2006-06-23 20:00:13 UTC

Permalink

classes do _not_ have to inherit from Object,

I was unaware of this subtlety. Thanks!

--
Darren New / San Diego, CA, USA (PST)
Native Americans used every part
of the buffalo, including the wings.

Chris Uppal

2006-06-20 17:51:41 UTC

Permalink

Easy, any statically typed language is not latently typed.

I'm actually not sure I agree with this at all. I believe that
reference values in Java may be said to be latently typed. Practically
all class-based OO
languages are subject to similar consideration, as it turns out.

Quite probably true of GC-ed statically typed languages in general, at least up
to a point (and provided you are not using something like a tagless ML
implementation). I think Rob is assuming a rather too specific implementation
of statically typed languages.

I'm unsure whether to consider explicitly stored array lengths, which
are present in most statically typed languages, to be part of a "type"
in this sense or not.

If I understand your position correctly, wouldn't you be pretty much forced to
reject the idea of the length of a Java array being part of its type ? If you
want to keep the word "type" bound to the idea of static analysis, then --
since Java doesn't perform any size-related static analysis -- the size of a
Java array cannot be part of its type.

That's assuming that you would want to keep the "type" connected to the actual
type analysis performed by the language in question. Perhaps you would prefer
to loosen that and consider a different (hypothetical) language (perhaps
producing identical bytecode) which does do compile time size analysis.

But then you get into an area where you cannot talk of the type of a value (or
variable) without relating it to the specific type system under discussion.
Personally, I would be quite happy to go there -- I dislike the idea that a
value has a specific inherent type.

It would be interesting to see what a language designed specifically to support
user-defined, pluggable, and perhaps composable, type systems would look like.
Presumably the syntax and "base" semantics would be very simple, clean, and
unrestricted (like Lisp, Smalltalk, or Forth -- not that I'm convinced that any
of those would be ideal for this), with a defined result for any possible
sequence of operations. The type-system(s) used for a particular run of the
interpreter (or compiler) would effectively reduce the space of possible
sequences. For instance, one could have a type system which /only/ forbade
dereferencing null, or another with the job of ensuring that mutability
restrictions were respected, or a third which implemented access control...

But then, I don't see a semantically critically distinction between such space
reduction being done at compile time vs. runtime. Doing it at compile time
could be seen as an optimisation of sorts (with downsides to do with early
binding etc). That's particularly clear if the static analysis is /almost/
able to prove that <some sequence> is legal (by its own rules) but has to make
certain assumptions in order to construct the proof. In such a case the
compiler might insert a few runtime checks to ensure that it's assumptions were
valid, but do most of its checking statically.

There would /be/ a distinction between static and dynamic checks in such a
system, and it would be an important distinction, but not nearly as important
as the distinctions between the different type systems. Indeed I can imagine
categorising type systems by /whether/ (or to what extent) a tractable static
implementation exists.

-- chris

P.S Apologies Chris, btw, for dropping out of a conversation we were having on
this subject a little while ago -- I've now said everything that I /would/ have
said in reply to your last post if I'd got around to it in time...

Marshall

2006-06-21 15:00:19 UTC

Permalink

When I used the word "type" above, I was adopting the
working definition of a type from the dynamic sense. That is, I'm
considering whether statically typed languages may be considered to also
have dynamic types, and it's pretty clear to me that they do.

I suppose this statement has to be evaluated on the basis of a
definition of "dynamic types." I don't have a firm definition for
that term, but my working model is runtime type tags. In which
case, I would say that among statically typed languages,
Java does have dynamic types, but C does not. C++ is
somewhere in the middle.

Marshall

Andreas Rossberg

2006-06-22 15:30:04 UTC

Permalink

It's worth noting, too, that (in some sense) the type of an object can
change over time[*].

No. Since a type expresses invariants, this is precisely what may *not*
happen. If certain properties of an object may change then the type of
the object has to reflect that possibility. Otherwise you cannot
legitimately call it a type.

Well, it seems to me that you are /assuming/ a notion of what kinds of logic
can be called type (theories), and I don't share your assumptions. No offence
intended.

OK, but can you point me to any literature on type theory that makes a
different assumption?

I see no reason,
even in practise, why a static analysis should not be able to see that the set
of acceptable operations (for some definition of acceptable) for some
object/value/variable can be different at different times in the execution.

Neither do I. But what is wrong with a mutable reference-to-union type,
as I suggested? It expresses this perfectly well.

- Andreas

Vesa Karvonen

2006-06-21 16:36:10 UTC

Permalink

[...]

Indeed, the ability to declare a new type that has the exact same
underlying representation and isomorphically identical operations but
not be the same type is something I find myself often missing in
languages. It's nice to be able to say "this integer represents vertical
pixel count, and that represents horizontal pixel count, and you don't
get to add them together."

Not counting C/C++, I don't know when I last worked with a typed
language that does *not* have this ability... (which is slightly
different from ADTs, btw)

Would Java count?

-Vesa Karvonen

Chris Smith

2006-06-25 18:39:52 UTC

Permalink

I haven't read all of this thread, I wonder, is the problem to do with
Class being mistaken for Type? (which is usually the issue)

Hi Andrew!

Not much of this thread has to do with object oriented languages... so
the word "class" would be a little out of place. However, it is true
that the word "type" is being used in the dynamically typed sense to
include classes from class-based OO languages (at least those that
include run-time type features), as well as similar concepts in other
languages. Some of us are asking for, and attempting to find, a formal
definition to justify this concept, and are so far not finding it.
Others are saying that the definition is somehow implicitly
psychological in nature, and therefore not amenable to formal
definition... which others (including myself) find rather unacceptable.

I started out insisting that "type" be used with its correct formal
definition, but I'm convinced that was a mistake. Asking someone to
change their entire terminology is unlikely to succeed. I'm now
focusing on just trying to represent the correct formal definition of
types in the first place, and make it clear when one or the other
meaning is being used.

Hopefully, that's a fair summary of the thread to date.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

David Hopwood

2006-06-21 15:29:30 UTC

Permalink

It's worth noting, too, that (in some sense) the type of an object can change
over time[*]. That can be handled readily (if not perfectly) in the informal
internal type system(s) which programmers run in their heads (pace the very
sensible post by Anton van Straaten today in this thread -- several branches
away), but cannot be handled by a type system based on sets-of-values (and is
also a counter-example to the idea that "the" dynamic type of an object/value
can be identified with its tag).
([*] if the set of operations in which it can legitimately partake changes.
That can happen explicitly in Smalltalk (using DNU proxies for instance if the
proxied object changes, or even using #becomeA:), but can happen anyway in less
"free" languages -- the State Pattern for instance, or even (arguably) in the
difference between an empty list and a non-empty list).

Dynamic changes in object behaviour are not incompatible with type systems based
on sets of values (e.g. semantic subtyping). There are some tricky issues in
making such a system work, and I'm not aware of any implemented language that
does it currently, but in principle it's quite feasible.

For a type system that can handle dynamic proxying, see
<http://www.doc.ic.ac.uk/~scd/FOOL11/FCM.pdf>.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Pascal Costanza

2006-06-23 11:32:22 UTC

Permalink

While this effort to salvage the term "type error" in dynamic
languages is interesting, I fear it will fail. Either we'll all
have to admit that "type" in the dynamic sense is a psychological
concept with no precise technical definition (as was at least hinted
by Anton's post earlier, whether intentionally or not) or someone is
going to have to propose a technical meaning that makes sense,
independently of what is meant by "type" in a static system.

What about this: You get a type error when the program attempts to
invoke an operation on values that are not appropriate for this
operation.
Examples: adding numbers to strings; determining the string-length of
a number; applying a function on the wrong number of parameters;
applying a non-function; accessing an array with out-of-bound indexes;
etc.

Yes, the phrase "runtime type error" is actually a misnomer. What one
usually means by that is a situation where the operational semantics
is "stuck", i.e., where the program, while not yet arrived at what's
considered a "result", cannot make any progress because the current
configuration does not match any of the rules of the dynamic
semantics.
The reason why we call this a "type error" is that such situations are
precisely the ones we want to statically rule out using sound static
type systems. So it is a "type error" in the sense that the static
semantics was not strong enough to rule it out.

Sending a message to an object that does not understand that message
is a type error. The "message not understood" machinery can be seen
either as a way to escape from this type error in case it occurs and
allow the program to still do something useful, or to actually remove
(some) potential type errors.

I disagree with this. If the program keeps running in a defined way,
then it is not what I would call a type error. It definitely is not
an error in the sense I described above.

If your view of a running program is that it is a "closed" system, then
you're right. However, maybe there are several layers involved, so what
appears to be a well-defined behavior from the outside may still be
regarded as a type error internally.

A very obvious example of this is when you run a program in a debugger.
There are two levels involved here: the program signals a type error,
but that doesn't mean that the system as a whole is stuck. Instead, the
debugger takes over and offers ways to deal with the type error. The
very same program run without debugging support would indeed simply be
stuck in the same situation.

So to rephrase: It depends on whether you use the "message not
understood" machinery as a way to provide well-defined behavior for the
base level, or rather as a means to deal with an otherwise unanticipated
situation. In the former case it extends the language to remove certain
type errors, in the latter case it provides a kind of debugging facility
(and it indeed may be a good idea to deploy programs with debugging
facilities, and not only use debugging tools at development time).

This is actually related to the notion of reflection, as coined by Brian
C. Smith. In a reflective architecture, you distinguish between various
interpreters, each of which interprets the program at the next level. A
debugger is a program that runs at a different level than a base program
that it debugs. However, the reflective system as a whole is "just" a
single program seen from the outside (with one interpreter that runs the
whole reflective tower). This distinction between the internal and the
external view of a reflective system was already made by Brian Smith.

Pascal

--
3rd European Lisp Workshop
July 3 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

Anton van Straaten

2006-06-23 07:42:25 UTC

Permalink

The short answer is that I'm most directly referring to "the types in
the programmer's head".

In the database theory world, we speak of three levels: conceptual,
logical, physical. In a dbms, these might roughly be compared to
business entities described in a requirements doc, (or in the
business owners' heads), the (logical) schema encoded in the
dbms, and the b-tree indicies the dbms uses for performance.
So when you say "latent types", I think "conceptual types."

That sounds plausible, but of course at some point we need to pick a
term and attempt to define it. What I'm attempting to do with "latent
types" is to point out and emphasize their relationship to static types,
which do have a very clear, formal definition. Despite some people's
skepticism, that definition gives us a lot of useful stuff that can be
applied to what I'm calling latent types.

The thing about this is, both the Lisp and the Haskell programmers
are using conceptual types (as best I can tell.)

Well, a big difference is that the Haskell programmers have language
implementations that are clever enough to tell them, statically, a very
precise type for every term in their program. Lisp programmers usually
don't have that luxury. And in the Haskell case, the conceptual type
and the static type match very closely.

There can be differences, e.g. a programmer might have knowledge about
the input to the program or some programmed constraint that Haskell
isn't capable of inferring, that allows them to figure out a more
precise type than Haskell can. (Whether that type can be expressed in
Haskell's type system is a separate question.)

In that case, you could say that the conceptual type is different than
the inferred static type. But most of the time, the human is reasoning
about pretty much the same types as the static types that Haskell
infers. Things would get a bit confusing otherwise.

And also, the
conceptual/latent types are not actually a property of the
program;

That's not exactly true in the Haskell case (or other languages with
static type inference), assuming you accept the static/conceptual
equivalence I've drawn above.

Although static types are often not explicitly written in the program in
such languages, they are unambiguously defined and automatically
inferrable. They are a static property of the program, even if in many
cases those properties are only implicit with respect to the source
code. You can ask the language to tell you what the type of any given
term is.

they are a property of the programmer's mental
model of the program.

That's more accurate. In languages with type inference, the programmer
still has to figure out what the implicit types are (or ask the language
to tell her).

You won't get any argument from me that this figuring out of implicit
types in a Haskell program is quite similar to what a Lisp, Scheme, or
Python programmer does. That's one of the places the connection between
static types and what I call latent types is the strongest.

(However, I think I just heard a sound as though a million type
theorists howled in unison.[*])

[*] most obscure inadvertent pun ever.

with or without static analysis
with or without runtime type information (RTTI or "tags")
with or without (runtime) safety
with or without explicit type annotations
with or without type inference
Wow. And I don't think that's a complete list, either.

Yup.

I would be happy to abandon "strong/weak" as terminology
because I can't pin those terms down. (It's not clear what
they would add anyway.)

I wasn't following the discussion earlier, but I agree that strong/weak
don't have strong and unambiguous definitions.

Uh, oh, a new term, "manifest." Should I worry about that?

Well, people have used the term "manifest type" to refer to a type
that's explicitly apparent in the source, but I wouldn't worry about it.
I just used the term to imply that at some point, the idea of "latent
type" has to be converted to something less latent. Once you explicitly
identify a type, it's no longer latent to the entity doing the identifying.

But if we ask Javascript what it thinks the type of timestwo is, by
evaluating "typeof timestwo", it returns "function". That's because the
value bound to timestwo has a tag associated with it which says, in
effect, "this value is a function".

Well, darn. It strikes me that that's just a decision the language
designers
made, *not* to record complete RTTI.

No, there's more to it. There's no way for a dynamically-typed language
to figure out that something like the timestwo function I gave has the
type "number -> number" without doing type inference, by examining the
source of the function, at which point it pretty much crosses the line
into being statically typed.

(Is it going to be claimed that
there is an *advantage* to having only incomplete RTTI? It is a
serious question.)

More than an advantage, it's difficult to do it any other way. Tags are
associated with values. Types in the type theory sense are associated
with terms in a program. All sorts of values can flow through a given
term, which means that types can get complicated (even in a nice clean
statically typed language). The only way to reason about types in that
sense is statically - associating tags with values at runtime doesn't
get you there.

This is the sense in which the static type folk object to the term
"dynamic type" - because the tags/RTTI are not "types" in the type
theory sense.

Latent types as I'm describing them are intended to more closely
correspond to static types, in terms of the way in which they apply to
terms in a program.

Hmmm. Another place where the static type isn't the same thing as
the runtime type occurs in languages with subtyping.

Yes.

Question: if a language *does* record complete RTTI, and the
languages does *not* have subtyping, could we then say that
the runtime type information *is* the same as the static type?

You'd still have a problem with function types, as I mentioned above,
as well as expressions like the conditional one I gave:

(flag ? 5 : "foo")

The problem is that as the program is running, all it can ever normally
do is tag a value with the tags obtained from the path the program
followed on that run. So RTTI isn't, by itself, going to determine
anything similar to a static type for such a term, such as "string |
number".

One way to think about "runtime" types is as being equivalent to certain
kinds of leaves on a statically-typed program syntax tree. In an
expression like "x = 3", an inferring compiler might determine that 3 is
of type "integer". The tagged values which RTTI uses operate at this
level: the level of individual values.

However, as soon as you start dealing with other kinds of nodes in the
syntax tree -- nodes that don't represent literal values, or compound
nodes (that have children) -- the possibility arises that the type of
the overall expression will be more complex than that of a single value.
At that point, RTTI can't do what static types do.

Even in the simple "x = 3" case, a hypothetical inferring compiler might
notice that elsewhere in the same function, x is treated as a floating
point number, perhaps via an expression like "x = x / 2.0". According
to the rules of its type system, our language might determine that x has
type "float", as opposed to "number" or "integer". It might then either
treat the original "3" as a float, or supply a conversion when assigning
it to "x". (Of course, some languages might just give an error and
force you to be more explicit, but bear with me for the example - it's
after 3:30am again.)

Compare this to the dynamically-typed language: it sees "3" and,
depending on the language, might decide it's a "number" or perhaps an
"integer", and tag it as such. Either way, x ends up referring to that
tagged value. So what type is x? All you can really say, in the RTTI
case, is that x is a number or an integer, depending on the tag the
value has. There's no way it can figure out that x should be a float at
that point.

Of course, further down when it runs into the code "x = x / 2.0", x
might end up holding a value that's tagged as a float. But that only
tells you the value of x at some point in time, it doesn't help you with
the static type of x, i.e. a type that either encompasses all possible
values x could have during its lifetime, or alternatively determines how
values assigned to x should be treated (e.g. cast to float).

BTW, it's worth noting at this point, since it's implied by the last
paragraph, that a static type is only an approximation to the values
that a term can have during the execution of a program. Static types
can (often!) be too conservative, describing possible values that a
particular term couldn't actually ever have. This gives another hint as
to why RTTI can't be equivalent to static types. It's only ever dealing
with the concrete values right in front of it, it can't see the bigger
static picture.

Anton

Marshall

2006-06-21 15:04:35 UTC

Permalink

Hmm... I think this distinction doesn't cover all cases.
Assume a language that
a) defines that a program is "type-correct" iff HM inference establishes
that there are no type errors
b) compiles a type-incorrect program anyway, with an establishes
rigorous semantics for such programs (e.g. by throwing exceptions as
appropriate).
The compiler might actually refuse to compile type-incorrect programs,
depending on compiler flags and/or declarations in the code.
Typed ("strongly typed") it is, but is it statically typed or
dynamically typed?

I think what this highlights is the fact that our existing terminology
is not up to the task of representing all the possible design
choices we could make. Some parts of dynamic vs. static
a mutually exclusive; some parts are orthogonal. Maybe
we have reached the point where trying to cram everything
in two one of two possible ways of doing things isn't going
to cut it any more.

Could it be that the US two-party system has influenced our
thinking?</joke>

Marshall

Dr.Ruud

2006-06-23 16:30:16 UTC

Permalink

I would suggest that at least assembly should be referred to as
"untyped".

There are many different languages under the umbrella of "assembly", so
your suggestion is bound to be false.

--
Affijn, Ruud

"Gewoon is een tijger."

Joachim Durchholz

2006-06-26 21:45:27 UTC

Permalink

There's a close connection between latent types in the sense I've
described, and the "tagged values" present at runtime. However, as
type theorists will tell you, the tags used to tag values at runtime,
as e.g. a number or a string or a FooBar object, are not the same
thing as the sort of types which statically-typed languages have.

Would that be a profound difference, or is it just that annotating a
value with a full type expression would cause just too much runtime
overhead?

It's a profound difference. The issue is that it's not just the values
that need to be annotated with types, it's also other program terms.

Yes - but isn't that essentially just auxiliary data from and for the
data-flow analysis that tracks what values with what types might reach
which functions?

In
addition, during a single run of a program, all it can ever normally do
is record the types seen on the path followed during that run, which
doesn't get you to static types of terms. To figure out the static
types, you really need to do static analysis.

Agreed.

Regards,
Jo

Joachim Durchholz

2006-06-14 19:09:08 UTC

Permalink

If a language can express constraints of one kind that is an increase
in expressiveness.

Agreed.

If a language requires constraint to be in one particular way thats a
decrease in expressiveness.

Unless alternatives would be redundant.
Having redundant ways to express the same thing doesn't make a language
more or less expressive (but programs written in it become more
difficult to maintain).

So I would say languages that can be statically typed and can be
dynamically typed are the most expressive. Languages that require
static typing or are dynamic but cannot express static typing are less
expressive.

Note that this is a different definition of expressiveness.
(The term is very diffuse...)

I think Felleisen's paper defines something that should be termed
"conciseness".
Whether there's a way to express constraints or other static properties
of the software is something different. I don't have a good word for it,
but "expressiveness" covers too much for my taste to really fit.

Regards,
Jo

David Hopwood

2006-06-24 21:53:04 UTC

Permalink

A type system that required an annotation on all subprograms that do not
provably terminate, OTOH, would not impact expressiveness at all, and would
be very useful.

Not at all. Almost all subprograms provably terminate (with a fairly
easy proof), even if they use general recursion or iteration.

If it were not the case that almost all functions provably terminate,
then the whole idea would be hopeless. If a subprogram F calls G, then
in order to prove that F terminates, we probably have to prove that G
terminates. Consider a program where only half of all subprograms are
annotated as provably terminating. In that case, we would be faced with
very many cases where the proof cannot be discharged, because an
annotated subprogram calls an unannotated one.

If, on the other hand, more than, say, 95% of subprograms provably
terminate, then it is much more likely that we (or the inference
mechanism) can easily discharge any particular proof involving more
than one subprogram. So provably terminating should be the default,
and other cases should be annotated.

In some languages, annotations may never or almost never be needed,
because they are implied by other characteristics. For example, the
concurrency model used in the language E (www.erights.org) is such
that there are implicit top-level event loops which do not terminate
as long as the associated "vat" exists, but handling a message is
always required to terminate.

Further, it imposes the
annotation requirement even where the programer might not
care about it, which the reverse does not do.

If the annotation marks not-provably-terminating subprograms, then it
calls attention to those subprograms. This is what we want, since it is
less safe/correct to use a nonterminating subprogram than a terminating
one (in some contexts).

There could perhaps be a case for distinguishing annotations for
"intended not to terminate", and "intended to terminate, but we
couldn't prove it".

I do not know how well such a type system would work in practice; it may
be that typical programs would involve too many non-trivial proofs. This
is something that would have to be tried out in a research language.

--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

Chris Smith

2006-06-22 03:29:09 UTC

Permalink

I'm unsure that it's correct to describe any language as having no
latent typing, in the sense that's being used in this thread. It might
be more correct to say "so specified latent typing" and/or "no latent
typing beyond what is provided by the execution environment, including
the CPU, virtual memory system, etc." as appropriate. I am aware of no
hardware environment that really accepts all possible values for all
possible operations without the potential of somehow signaling a type
violation.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Anton van Straaten

2006-06-23 00:54:13 UTC

Permalink

http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-4.html
1.1 Semantics
...
Scheme has latent as opposed to manifest types. Types are assoc-
iated with values (also called objects) rather than with variables.
(Some authors refer to languages with latent types as weakly typed
or dynamically typed languages.) Other languages with latent types
are APL, Snobol, and other dialects of Lisp. Languages with manifest
types (sometimes referred to as strongly typed or statically typed
languages) include Algol 60, Pascal, and C.

Maybe this is the original source of the myth that static typing is all
about assigning types to variables...
With all my respect to the Scheme people, I'm afraid this paragraph is
pretty off, no matter where you stand. Besides the issue just mentioned
it equates "manifest" with static types. I understand "manifest" to mean
"explicit in code", which of course is nonsense - static typing does not
require explicit types. Also, I never heard "weakly typed" used in the
way they suggest - in my book, C is a weakly typed language (= typed,
but grossly unsound).

That text goes back at least 20 years, to R3RS in 1986, and possibly
earlier, so part of what it represents is simply changing use of
terminology, combined with an attempt to put Scheme in context relative
to multiple languages and terminology usages. The fact that we're still
discussing this now, and haven't settled on terminology acceptable to
all sides, illustrates the problem.

The Scheme report doesn't actually say anything about how latent types
relate to static types, in the way that I've recently characterized the
relationship here. However, I thought this was a good place to explain
how I got to where I am on this subject.

There's been a lot of work done on soft type inference in Scheme, and
other kinds of type analysis. The Scheme report talks about latent
types as meaning "types are associated with values". While this might
cause some teeth-grinding amongst type theorists, it's meaningful enough
when you look at simple cases: cases where the type of a term is an
exact match for the type tags of the values that flow through that term.

When you get to more complex cases, though, most type inferencers for
Scheme assign traditional static-style types to terms. If you think
about this in conjunction with the term "latent types", it's an obvious
connection to make that what the inferencer is doing is recovering types
that are latent in the source.

Once that connection is made, it's obvious that the tags associated with
values are not the whole story: that the conformance of one or more
values to a "latent type" may be checked by a series of tag checks, in
different parts of a program (i.e. before, during and after the
expression in question is evaluated). I gave a more detailed
description of how latent types relate to tags in an earlier reply to
Marshall (Spight, not Joe).

Anton