Discussion:
Enhanced type guard syntax]
unknown
2003-09-18 09:37:15 UTC
Permalink
Below follows a suggestion regarding enhanced syntax for
writing type guards in function clauses and in other match expressions.

I post this to see if you in the Erlang community have some (positive
or negative) opinions regarding this and I hope I get a lot of feedback.




Enhanced type guard syntax
---------------------------------------

The situation to day is that it is quite cumbersome to write type guards
for a function and that it
is unfair that some type constraints can be expressed directly in the
argument list (list, tuple,Binary)
while other type constraints must be expressed in the guard
(integer,atom,float,pid,port,fun) where the argument names have to be
repeated again.
The idea is originally from Mats Cronqvist AXD.

Examples as it is today:
------------------------

% The constraint that A and B must be integers
% must be expressed in the guard notation with the arguments repeated
%
foo(A, B) when integer(A), integer(B) ->
A * B.

% The constraint that it must be a list (with at least one element)
% can be expressed directly in the argumentlist
%
bar([H|T]) ->
H.



Proposed solution
------------------

Examples with new suggested syntax:
-----------------------------------

% The type constraint expressed directly in the argument list
% Shorter to write
% Exactly the same semantics as the foo example above
% The compiler has potential of handling these type constraints more
efficiently than today
% and this syntax makes the different type of guards more visible
%
foo(A/integer, B/integer) ->
A * B.


{X/float, Y/float} = graph:coordinates(),

The rationale behind this suggestion is that we already have the
Variable/type syntax in the matchexpressions for binaries (example:
<<Size,B:Size/binary,Rest/binary>> = <<2,"AB","CD">>)
It is unfair that some types can be matched without guard syntax while
others can't.
It will result in a more compact and intuitive way of writing that will
encourage writing type constraints which will catch errors as early and
efficient as possible. The allowed type specifiers should be all the
type test BIFs named is_XXX/1 but without the "is_" prefix.

Alternative syntax
-------------------

Of course some other special character can be used to distinguish the
type constraint from other tokens and one idea could be to make this as
an extension to the '_' (don't care) notation which then indicates don't
care the value but it should be of a certain type.

foo(A = $integer, B = $integer) ->
A*B.

foo(A = _$integer, B = _$integer) ->

Advantages with this solution could be that it does only introduce the
new variants of typed don't care.


/Regards Kenneth Lundin Product Manager of Erlang/OTP
unknown
2003-09-18 10:07:27 UTC
Permalink
Hi,
Post by unknown
Below follows a suggestion regarding enhanced syntax for
writing type guards in function clauses and in other match expressions.
I think it is a great proposal, one of those I wish I thought of first :-)

I think the first way looks better, and is also easier to understand.
Post by unknown
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
The alternative today is much hairier to write, so it's a welcomed addition for
me.

Will a type mismatch be treated just as any other pattern mismatch?
I suppose there won't be any automatic type coercions (between float and
integer), would they?
Maybe this could be extended to somehow be able to pack the size() guard in a
similar way (a la binary syntax)?
Post by unknown
The allowed type specifiers should be all the
type test BIFs named is_XXX/1 but without the "is_" prefix.
I'd like to suggest allowing a "string" type declaration too, making it a
equivalent to "list". Maybe sometime there will be a string type...

regards,
Vlad
unknown
2003-09-18 10:59:53 UTC
Permalink
Kenneth Lundin wrote:
...deleted
Post by unknown
------------------------
% The constraint that A and B must be integers
% must be expressed in the guard notation with the arguments repeated
%
foo(A, B) when integer(A), integer(B) ->
A * B.
to help beginners i find it to be a good idea to use ''modern'' erlang
as much as possible in examples, etc. this example should (imho) be:

foo(A, B) when is_integer(A) and is_integer(B) ->
A*B.

...deleted
Post by unknown
Proposed solution
------------------
-----------------------------------
% The type constraint expressed directly in the argument list
% Shorter to write
% Exactly the same semantics as the foo example above
% The compiler has potential of handling these type constraints more
efficiently than today
% and this syntax makes the different type of guards more visible
%
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
this seems to be the best way.
Post by unknown
Alternative syntax
-------------------
Of course some other special character can be used to distinguish the
type constraint from other tokens and one idea could be to make this as
an extension to the '_' (don't care) notation which then indicates don't
care the value but it should be of a certain type.
foo(A = $integer, B = $integer) ->
A*B.
foo(A = _$integer, B = _$integer) ->
Advantages with this solution could be that it does only introduce the
new variants of typed don't care.
surely this alternative syntax is confusing? when i see 'A = _$integer'
i take it to mean that the value is of interest, and the type might be
integer, but i do not care.

foo( _A/integer, _B/integer ) ->

is ''the best'' way to express 'do not care about value, but type must
be correct'.


bengt, who really wants type checking, please.
unknown
2003-09-18 11:22:05 UTC
Permalink
This is a wonderful suggestion. I would welcome it. Furthermore I
would also welcome compile-time checking in cases where it is
practical. Such a small thing could be a stepping stone to some
ultimate type system.
Post by unknown
foo(A/integer, B/integer) ->
A * B.
What is the runtime expense of performing such checks?

The technique I employ is liberal use of unit tests so I can avoid type
checking at runtime and gain a slight performance advantage. But
perhaps this is an illusion? What is the real expense?

Would this lead the way to a user-defined type system some day, one that
even supports recursive type definitions?

-type (blammo, [atom() | blammo()]).
foo (A/blammo, B/blammo) ->
blammize (A, B).

Maybe I've been hanging around ObjectiveCAML too much. M. Logan
probably thinks I'm crazy. ;-)

-e
unknown
2003-09-18 12:14:35 UTC
Permalink
Post by unknown
What is the runtime expense of performing such checks?
There is a little overhead when you have the basic datastructures. It
gets wors for datastructures that become complicated, like a record
with fields that contain records, which are lists etc.

I wrote a dynamic type checker in 1998, worked fine. Problem is that the
source code is changed and people tend not to like that :0).
Attached is the main module of that project. I saw that it was made for
the old jam binaries, instead of for the beam. Got really out of sync and
you need to adopt it to R9 if you want to use it. Guess it's two days work
for a good programmer.
Post by unknown
Would this lead the way to a user-defined type system some day, one that
even supports recursive type definitions?
-type (blammo, [atom() | blammo()]).
foo (A/blammo, B/blammo) ->
blammize (A, B).
That's indeed what we are missing. However, there was a philosophy that
all guards should be executable in constant time. In case of recursively
defined user types, this does not hold any longer.

/Thomas

---
Dr Thomas Arts
Program Manager
Software Engineering and Management
IT-university in Gothenburg
Box 8718, 402 75 Gothenburg, Sweden
http://www.ituniv.se/

Tel +46 31 772 6031
Fax +46 31 772 4899

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dtc_typefun.erl
Type: application/octet-stream
Size: 10229 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20030918/98a84313/attachment.obj>
unknown
2003-09-18 12:30:04 UTC
Permalink
Thomas Arts wrote:
...deleted
Post by unknown
I wrote a dynamic type checker in 1998, worked fine. Problem is that the
have you looked a the new type checker, as presented at the erlang
workshop '03, in uppsala?
would you care to comment upon similarities/differences/etc?


bengt
unknown
2003-09-18 20:02:26 UTC
Permalink
[...] there was a philosophy that all guards should be executable in
constant time.
This philosophy doesn't work out IMHO.
Either, the guard is used as a selector. Then, if the guard is
disallowed, people are going to write the exactly same non-constant-time
code into the condition of an "if" statement, and nothing is won.
Or the guard is used as a precondition. In that case, it shouldn't be a
guard (but an assert statement or something - particularly, something
that can be switched off for production code).

Just my 2c.

Regards,
Jo
unknown
2003-09-18 22:24:41 UTC
Permalink
On Thu, 18 Sep 2003 22:02:26 +0200
Post by unknown
[...] there was a philosophy that all guards should be executable in
constant time.
This philosophy doesn't work out IMHO.
Either, the guard is used as a selector. Then, if the guard is
disallowed, people are going to write the exactly same
non-constant-time code into the condition of an "if" statement, and
nothing is won. Or the guard is used as a precondition. In that case,
it shouldn't be a guard (but an assert statement or something -
particularly, something that can be switched off for production code).
Just my 2c.
Regards,
Jo
Sing it, brother!

A much nicer world would be:

1) the programmer writes whatever code they want in the "when" test
2) the compiler analyzes that code
3) whatever parts of it that have side-effects are rejected
4) whatever parts of it are constant-time are used as the guard
5) whatever parts of it are non-constant-time are factored out into a
"case" statement

But that sort of requires that a) compiler-writers are well paid and b)
the language is optimized for maintainability over efficiency... which
only ever seem to hold true in my private little world.

-Chris
unknown
2003-09-19 07:43:50 UTC
Permalink
In the discussion on the type guard syntax I am missing
the vision of what the type language should look like.
Sure, we have the basic types and it all looks nice for
integers and atoms.... but before you introduce a new
syntax for something that is already exist, why not give it
a thought what you want to achieve with it?

A new notiation does not give you new possibilities in
type checking! In particular one should consider whether
one would allow guards with user defined types or guards
with argumented types, polymorphic types, etc.

For example:

is f(X) when list(integer(X)) ->
a good notation to express that X should be a list of integers?

would that translate into f(X/list(integer)) ?

is f(X,Y) when dictionary(K,V)(X), K(Y) ->
a good notation for a dictionary of key value pairs with
polymorphic type K for keys and V for values and a binding
of the key type for the second argument of the funtion?
would that be
f(X/dictionary(K,V), Y/K) ?

is f(X/list/list) a notation for list of list or should it be
f(X/list(list)), in which case the type notation for list is both
a constant and a unary function.

is it possible to catch in a type notation:
f(X) when 10<X, X<20 ->
by f(X/[10..20]) ?

My advice is to first discuss a suitable type language, see if that
is realizable without performance penalties and then propose a
syntactic sugar notation for it.

One more remark, in the typed lambda calculus they use a type
definition along with the variable in the same style as proposed,
but they use : instead of /. I would really be in favour to use that much
more standard notation, thus f(X:list(A), Y:A) -> ...
One can use the same notation to give types to functions, as type
annotations that can be checked by a compiler (without confusing
it with devision) It will, though, be confused with module application.
An alternative therefor is ::
Semantically the : is "is element of" and a small epsilon could be used
instead. However, we want to keep the ascii character set as convention,
I guess.

sort([]:list(A)) ->
[]:list(A);
sort([X|Xs]:list(A)) ->
([ Y:A || Y<-Xs, Y<X] ++ [X] ++
[ Y:A || Y<-Xs, X=<Y]):list(A).

these annotations help the type checker, if available and can be left out
in compiling for production code.

/Thomas

---
Dr Thomas Arts
Program Manager
Software Engineering and Management
IT-university in Gothenburg
Box 8718, 402 75 Gothenburg, Sweden
http://www.ituniv.se/

Tel +46 31 772 6031
Fax +46 31 772 4899
unknown
2003-09-19 08:31:03 UTC
Permalink
Hi Gurus,

Maybe thinking about how to achieve some form of parameter checking
using existing pattern matching language features will put some kind of
(skewed) perspective on the problem.

If you _need_ to know what type of parameter is being given as an
argument, would it be more explicit and practical to simply use tagged
tuples?
Post by unknown
is f(X) when list(integer(X)) ->
a good notation to express that X should be a list of integers?
perhaps this:

f({list_of_integers,X}) ->

will provide a strong enough guard to to prevent illegal use of this
function.
Post by unknown
is f(X,Y) when dictionary(K,V)(X), K(Y) ->
a good notation for a dictionary of key value pairs with
polymorphic type K for keys and V for values and a binding
of the key type for the second argument of the funtion?
again, maybe this will do the job:

f({dictionary_of_key_value_pairs,X},{key_for_dictionary,Y}) ->

(In practice I would think of using a less verbose version!)

After all, type-checking languages are only offering a limited kind of
pattern matching, aren't they?

This scheme also makes it clear that {kelvin,RealNumber} is what it says
it is, not just an abstract numeric type for example. A very desirable
feature in my opinion.

(Combine that with "when float(RealNumber), RealNumber >= 0" to add belt
and braces).

Can someone please write a counter-argument and find the leaks in this
idea, to see if it holds water?

Pete.
Post by unknown
In the discussion on the type guard syntax I am missing
the vision of what the type language should look like.
Sure, we have the basic types and it all looks nice for
integers and atoms.... but before you introduce a new
syntax for something that is already exist, why not give it
a thought what you want to achieve with it?
A new notiation does not give you new possibilities in
type checking! In particular one should consider whether
one would allow guards with user defined types or guards
with argumented types, polymorphic types, etc.
is f(X) when list(integer(X)) ->
a good notation to express that X should be a list of integers?
would that translate into f(X/list(integer)) ?
is f(X,Y) when dictionary(K,V)(X), K(Y) ->
a good notation for a dictionary of key value pairs with
polymorphic type K for keys and V for values and a binding
of the key type for the second argument of the funtion?
would that be
f(X/dictionary(K,V), Y/K) ?
is f(X/list/list) a notation for list of list or should it be
f(X/list(list)), in which case the type notation for list is both
a constant and a unary function.
f(X) when 10<X, X<20 ->
by f(X/[10..20]) ?
My advice is to first discuss a suitable type language, see if that
is realizable without performance penalties and then propose a
syntactic sugar notation for it.
One more remark, in the typed lambda calculus they use a type
definition along with the variable in the same style as proposed,
but they use : instead of /. I would really be in favour to use that much
more standard notation, thus f(X:list(A), Y:A) -> ...
One can use the same notation to give types to functions, as type
annotations that can be checked by a compiler (without confusing
it with devision) It will, though, be confused with module application.
Semantically the : is "is element of" and a small epsilon could be used
instead. However, we want to keep the ascii character set as convention,
I guess.
sort([]:list(A)) ->
[]:list(A);
sort([X|Xs]:list(A)) ->
([ Y:A || Y<-Xs, Y<X] ++ [X] ++
[ Y:A || Y<-Xs, X=<Y]):list(A).
these annotations help the type checker, if available and can be left out
in compiling for production code.
/Thomas
---
Dr Thomas Arts
Program Manager
Software Engineering and Management
IT-university in Gothenburg
Box 8718, 402 75 Gothenburg, Sweden
http://www.ituniv.se/
Tel +46 31 772 6031
Fax +46 31 772 4899
unknown
2003-09-19 08:47:59 UTC
Permalink
Post by unknown
f({list_of_integers,X}) ->
this is inflexible for a type checker. The tuple notation is fine, but
the verbose way of writing the atom hides the semantics.
Post by unknown
f({dictionary_of_key_value_pairs,X},{key_for_dictionary,Y}) ->
How do you know that there is a relation between the key in the first atom
and the key in the second atom? They might refer to completely different
things. That is where variables come in.

/Thomas
unknown
2003-09-19 08:55:41 UTC
Permalink
Okay, how about going to extremes:

f({{dictionary,store},X},{{dictionary,key},Y}) ->

Now you know they are related (-: and you can do a general case:

f({{dictionary,_whatever},Thing}) ->

I would agree that my example is not practical, but it may seed another
idea from someone?

Pete.
Post by unknown
Post by unknown
f({list_of_integers,X}) ->
this is inflexible for a type checker. The tuple notation is fine, but
the verbose way of writing the atom hides the semantics.
Post by unknown
f({dictionary_of_key_value_pairs,X},{key_for_dictionary,Y}) ->
How do you know that there is a relation between the key in the first atom
and the key in the second atom? They might refer to completely different
things. That is where variables come in.
/Thomas
unknown
2003-09-19 08:53:07 UTC
Permalink
While I'm no guru, I'll give my 2c

From: "Peter-Henry Mander"
Post by unknown
If you _need_ to know what type of parameter is being given as an
argument, would it be more explicit and practical to simply use tagged
tuples?
Post by unknown
is f(X) when list(integer(X)) ->
a good notation to express that X should be a list of integers?
f({list_of_integers,X}) ->
will provide a strong enough guard to to prevent illegal use of this
function.
The idea is certainly interesting, but one problem that I can see is that one
has to either

- return {list_of_integers, Value} from all functions that compute such a beast

or

- be forced to write
Result = fun_that_returns_a_list_of_integers(),
f({list_of_integers, Result}),
instead of
f(fun_that_returns_a_list_of_integers()),

The big advantage of using the real types and not tags is that they are present
in the runtime, while invisible in the code.

regards,
Vlad
unknown
2003-09-19 09:06:18 UTC
Permalink
Hi Vlad,

Vlad Dumitrescu wrote:
[...]
Post by unknown
The idea is certainly interesting, but one problem that I can see is that one
has to either
- return {list_of_integers, Value} from all functions that compute such a beast
Not necessarily, for example:

sum({list_of_integers, Value}) -> ... {integer,Sum}.

or

sum({list_of_integers, Value}) -> ... Sum.

is a design issue.

Or am I missing your point?

This scheme must be applied throughout where "types" are required for it
to work, and I agree that it can become tiresome to write, but design it
right and it will save some pain later.

If you return tagged types, it will force users of the function to honor
the type scheme, and isn't that what the goal is?

Pete.
Post by unknown
or
- be forced to write
Result = fun_that_returns_a_list_of_integers(),
f({list_of_integers, Result}),
instead of
f(fun_that_returns_a_list_of_integers()),
The big advantage of using the real types and not tags is that they are present
in the runtime, while invisible in the code.
regards,
Vlad
unknown
2003-09-19 09:23:12 UTC
Permalink
From: "Peter-Henry Mander"
Post by unknown
Post by unknown
- return {list_of_integers, Value} from all functions that compute such a beast
sum({list_of_integers, Value}) -> ... {integer,Sum}.
or
sum({list_of_integers, Value}) -> ... Sum.
is a design issue.
Or am I missing your point?
No it was my point :-) What I was thinking about was that using this scheme from
the very start of the development (i.e. prototyping) would make it no nolger be
rapid, which is a very good thing about Erlang. To introduce the tags at a later
stage is cumbersome.
Post by unknown
This scheme must be applied throughout where "types" are required for it
to work, and I agree that it can become tiresome to write, but design it
right and it will save some pain later.
If you return tagged types, it will force users of the function to honor
the type scheme, and isn't that what the goal is?
I am not sure if forcing developers to do things in a more convoluted way is
always a good thing, even if there are gains somewhere else. What I think of is
the Java exception declaration for each method - it is useful in some ways, but
a pain while developing.

To sum my point, I think the idea is interesting and I hope it will lead to
something really useful.

/Vlad
unknown
2003-09-19 10:28:43 UTC
Permalink
Hi Vlad, Hi Thomas,
Post by unknown
Post by unknown
sum({list_of_integers, Value}) -> ... {integer,Sum}.
or
sum({list_of_integers, Value}) -> ... Sum.
is a design issue.
Or am I missing your point?
No it was my point :-) What I was thinking about was that using this scheme from
the very start of the development (i.e. prototyping) would make it no nolger be
rapid, which is a very good thing about Erlang. To introduce the tags at a later
stage is cumbersome.
Ah, yes, but when prototyping I'm ready to scrap old ideas to introduce
something better or more specific. Is it true that if the prototype has
matured to the point where making changes is difficult, it is no longer
really a suitable prototype?
Post by unknown
Post by unknown
If you return tagged types, it will force users of the function to honor
the type scheme, and isn't that what the goal is?
I am not sure if forcing developers to do things in a more convoluted way is
always a good thing, even if there are gains somewhere else.
I think that if the prototype demonstrates the need for stricter types,
an experienced programmer/designer will introduce them as required, and
avoid the pain later, unless of course the design decision was wrong.

But if the _design_ is wrong, type checking won't help avoid the pain of
making changes to correct it!

The "convolution" is there for a purpose, and tagging makes the reason
clear (if the name is well chosen). If the tagging is inappropriate, or
doesn't permit nested categories of example (Thomas, did my answer of
using n-tuples press any buttons?), or simply not necessary it can be
changed or removed globally. But I'm sure that *adding* tags after
creating a tagless design will be painful and error prone.

It's a design decision that ought be made early, but it may be delayed
until a simple prototype indicates the need for stricter parameter types.

But looking at my own code, this tagging scheme isn't pervasive, and I
get sufficient control over data with the few tags I've (carefully :-)
chosen to not desire any extra type checking in the compiler.

This may explain why I'm not sure I understand the type-strict argument
properly. Thomas, I need your opinion on this, you've got an attentive
student here :-)

Pete.
unknown
2003-09-19 10:34:47 UTC
Permalink
Just a short comment (for now :)

From: "Peter-Henry Mander"
Post by unknown
I think that if the prototype demonstrates the need for stricter types,
an experienced programmer/designer will introduce them as required, and
avoid the pain later, unless of course the design decision was wrong.
What if the compiler will be able to optimize a lot more if supplied with type
information? This has nothing to do with design, but depends on the specific
compiler implementation.

I feel there are several issues at stake and that I can't separate them
properly, so I'll take my time to think over properly.

/Vlad
unknown
2003-09-19 11:34:45 UTC
Permalink
Hi Vlad,

You're right, there is the issue of optimisation, which may bear on some
languages. But can you give an example of what optimisations would
vastly improve Erlang performance? If straight-line dragster performance
is an issue, maybe Erlang is the wrong tool? It's strengths are
elsewhere. I'm using Erlang as a Megaco signalling stress tester, and
Jove! It beats the pants off commercial testing tools because it's power
is in concurrency, not so much in single-threaded speed.

Maybe some elaborate and sophisticated data analysis on ordinary Erlang
code would reveal optimisations that a simple human could not envisage?
I would prefer a tool to do the grunt work and avoid me making mistaken
"optimisation" judgements. Besides, if the algorithm is wrong anyway,
then the programmer is the problem, not the language.

So we have (as a first draft) type strictness for:-

1) code correctness and proof at compile time

2) trapping errors at runtime,

3) compiler code optimisation pass.
Post by unknown
Just a short comment (for now :)
From: "Peter-Henry Mander"
Post by unknown
I think that if the prototype demonstrates the need for stricter types,
an experienced programmer/designer will introduce them as required, and
avoid the pain later, unless of course the design decision was wrong.
What if the compiler will be able to optimize a lot more if supplied with type
information? This has nothing to do with design, but depends on the specific
compiler implementation.
I feel there are several issues at stake and that I can't separate them
properly, so I'll take my time to think over properly.
/Vlad
unknown
2003-09-19 11:18:13 UTC
Permalink
Dear Peter
Post by unknown
The "convolution" is there for a purpose, and tagging makes the reason
clear (if the name is well chosen). If the tagging is inappropriate, or
doesn't permit nested categories of example (Thomas, did my answer of
using n-tuples press any buttons?), or simply not necessary it can be
changed or removed globally. But I'm sure that *adding* tags after
creating a tagless design will be painful and error prone.
I think this discussion is mixing up two ideas.
1) There is a proposal to write type guards in a different way
2) There is an older discussion, that people like to have some
type checking support.

W.r.t. 1) if you make a choice, my point was to realize that you might
influence the possibilities for 2).

The discussion between Vlad and Peter seems to rise from a
misunderstanding between Static type checking and Dynamic
type checking. Peter basically proposes to have tags for all
types at runtime. In the present Erlang compiler, all basic types
are already tagged, though not by tuples, but by the internal
representation. That is why you can get an runtime type error
when adding an integer to an atom.
If you want to extend that principle with user defined types, you
might do that by tuples and a lot of work, or add something to
the internal representation. No matter how, you need a representation
for it.

Static type checking may use the same representation as the
dynamic one, but checks at compile time whether there is a
type conflict. At runtime there need not be any type information
available, since you are guaranteed (by the checker) that you
will never add an integer to an atom.

In Erlang, I am convinced, you need a combination of dynamic
and static type checking.

Notation does not solve the type-check problem, but it can
complicate it. Therefore, if you favour the type-guard notation
that Kenneth proposed, I think you should realize the consequences
for the type checking.

Oh, and to answer Mike... one should definitely take away
the type guards as they are today, therewith loosing backward
compatibility. You can otherwise write stupid things like:

f(X/integer) when string(X) ->
....

/Thomas

---
Dr Thomas Arts
Program Manager
Software Engineering and Management
IT-university in Gothenburg
Box 8718, 402 75 Gothenburg, Sweden
http://www.ituniv.se/

Tel +46 31 772 6031
Fax +46 31 772 4899
unknown
2003-09-19 11:35:24 UTC
Permalink
Post by unknown
Dear Peter
Hi Thomas,
Post by unknown
If you want to extend that principle with user defined types, you
might do that by tuples and a lot of work, or add something to
the internal representation. No matter how, you need a representation
for it.
Static type checking may use the same representation as the
dynamic one, but checks at compile time whether there is a
type conflict. At runtime there need not be any type information
available, since you are guaranteed (by the checker) that you
will never add an integer to an atom.
Thanks for explaining that, I forgot the static/dynamic aspect.

So (including aspects in an email to Vlad) this has an impact on:-

1) code correctness and checking during compilation, without added
"binary-baggage" at runtime,

2) trapping exceptions at runtime, which is what tagging was meant to
achieve, albeit clumsily, I concede.

3) optimisation (maybe?).
unknown
2003-09-19 17:22:35 UTC
Permalink
On Fri, 19 Sep 2003 12:35:24 +0100
Post by unknown
[...]
So (including aspects in an email to Vlad) this has an impact on:-
1) code correctness and checking during compilation, without added
"binary-baggage" at runtime,
2) trapping exceptions at runtime, which is what tagging was meant to
achieve, albeit clumsily, I concede.
3) optimisation (maybe?).
Congratulations. What started out as a proposal for a frivolous but
otherwise seemingly harmless piece of syntactic sugar, has blossomed
into yet another a monstrously murky discussion on typing.

(See Pete? You aren't the only one who can be flippant :)

I think we are all missing something simple and obvious here, and
getting hung up in relatively meaningless details.

Here is my thought - Erlang doesn't need another type _checker_. Type
checkers do not appeal to the laziness of Erlang programmers, who don't
want to spell out stuff they already know! What Erlang needs is a type
_inferencer_. (And if the inferencer can figure out other properties
such as side-effectfulness, which shouldn't be too much extra
difficultly, then all the better.)

Note that I'm not suggesting that Erlang should "have [only] inferred
types." I don't think such a concept makes much sense in Erlang. If
you want to tell Erlang that X should be some type y, you should be able
to tell it that (and a guard test is one way to do that.)

But I think that a lot of the time, you actually don't need to do that.
This is mostly an educated guess, but it seems like ~70% or more of the
type consistency that _counts_ could be inferred by a pre-compile pass.

This information could then be used to generate errors (at compile time
*and* run time) and optimizations and documentation... and pretty
kaliedoscope patterns for your screen-saver, if you like.

Simple examples are easy to come by. If you have a function like

hyp(A, B) -> math:sqrt(A * A + B * B).

and you know the type signatures of *, +, and math:sqrt/1, you can
pretty much instantly derive the type signature of hyp/2
(FLOAT x FLOAT -> FLOAT). Basically, you start with the assumption that
every variable can be of any type, and you pare it down by looking at
how it is used. The sets module would be handy for this, because it's
all about sets of types.

While you are doing this, you can also learn other things about
functions - for example, a function is side-effectless only if all the
functions and operators it calls are also side-effectless - so you can
put this information in its signature, too. Constant-time is trickier,
but I _think_ possibly still possible. (At least, I hope so.)

Now, guards (however they're notated) suddenly play a more important
role in things - they tell the type inferencer what is and is not
allowed. So when you write

foo(A) -> A.

all the type inferencer can come up with for the signature of foo/1 is
ANY -> ANY (where ANY is the set of all types.) But if you add a guard
to foo/1 like so:

foo(A) when is_integer(A) -> A.

suddenly the type inferencer knows foo/1 is INTEGER -> INTEGER. And in
some other place in the file, if you call foo(bar), it will complain, at
compile time, that the sets ATOM and INTEGER are disjunct, you silly,
fallible human, you.

This could also open the door for user-defined guards - because it can
check that guard expressions are indeed side-effectless and
constant-time, and disallow them (or refactor them) if they aren't.

My 2 Zorkmids for this morning,

-Chris
unknown
2003-09-20 09:25:03 UTC
Permalink
Hello,

This is probably a bad time to start a different thread about typing
in Erlang. But I'm doing it anyway :-). In my playing around I have
found a nice type system, and would like to put it on the radar of
Erlang hackers and hear what people doing static-typing for Erlang
think of it.

I refer to CMU Common Lisp (CMUCL). This is a free Lisp system that
has been actively developed since early in the 1980s and continues
today. http://www.cons.org/cmucl/

They have a very interesting approach to typing which is both static
and dynamic. They don't make the language fully statically typed, but
instead do as much as possible with what type information they have
available. The type information available in a Common Lisp program is
very similar to what's available in an Erlang program, mostly from
type-checking guards. This makes me think the CMUCL approach may work
well for Erlang too.

CMUCL is interesting because they do a _lot_ with the type information
available. For example:

* Compiler optimizations. e.g. if some variables are declared to be
floats, you can generate more efficient code for doing arithmetic on
them. Erlang/BEAM does this too - that's why there are so many
'float(X)' type declarations in the Wings3D code that Ulf quoted
(excerpted down below). This is safe because before the
float-specific code is executed the arguments are runtime-checked to
actually be floats -- like in Erlang, type guards/declarations are
not blindly trusted, they are treated as assertions to be checked.

* Type inference. If some variables have specified types, this is
used to figure out the types of more variables. For example, with:

foo(X, Y) when float(X) ->
Z = X * Y,
...

You can figure out that Z is also a float. You also know that Y has
to be a number in any of the "..." code, because otherwise it would
have crashed with badarg on the the first line before getting that
far.

* Efficiency notes. If you want a function to be fast, you can tell
the compiler so. It will then report every optimization opportunity
that was missed because of insufficient type information.

Take the Wings3d example again: they have added float(X)
declarations in strategic places to generate fast code. Exactly
which declarations they add are, I believe, based on deep knowledge
of the compiler and BEAM.

In CMUCL it is easier for mortals. You write the function with no
type declarations, and say "this has to be fast". Then the compiler
will print notes to the effect that "Hey, I could generate faster
code if I knew that X, Y, and Z were floats". Then, if you know that
they really should be floats, you add the type guards and get the
good code. Thus you can do effective low-level optimization without
intimate knowledge of the compiler.

* Conservative static type error detection. If the compiler sees
that you will cause a type error, it will give you a warning. It
does this when it can prove that your code _will_ lead to a type
error, not just that it might. For example, this code would generate
a warning:

X + length(X)

because it could never work correctly: X has to be both a list and
an integer, which is impossible. Note that it's a warning and not an
error: maybe you want to test some other parts of the program before
you fix this.

Overall you get a lot of benefits and none of the drawbacks usually
associated (by us Erlang programmers at least) with static typing -
i.e. it makes no restrictions on how you write code and you can ignore
the warnings if you want. In practice you do not ignore the warnings,
because the compiler only bothers to warn you when you've almost
certainly made a mistake.

So what does this mean to us Erlang programmers? Is this something
we'd like? Are recent compiler/BEAM developments already leading us in
this direction? Are there some fundamental reasons that this would not
work as well for Erlang that the type-theorists can tell us? Should I
shut the hell up about Lisp already? :-)

In any case, there are worse ways to spend a lazy afternoon than
reading the CMU Common Lisp user manual -- particularly the section on
advanced compiler usage. It is a seriously excellent work of
programming language implementation, even if you aren't interested in
Lisp per se. The URL is:

http://www.cons.org/cmucl/doc/index.html

Also -- I don't like playing the luddite, but I never understand
conference talks about type systems. I'm not familiar with the
formalism(s) that people use and just see a lot of greek letters. Can
someone explain what formalisms people are using (e.g. is it
declarative semantics, or..?), and perhaps point to a book or two that
one could read to appreciate what's going on?

Here's the Wings3D code that Ulf quoted:

twist_fun(y, {Cx,_,Cz}) ->
fun(U, Min, {X0,Y,Z0})
when float(U), float(Min), float(X0), float(Y), float(Z0) ->
Angle = U*(Y-Min),
Cos = math:cos(Angle),
Sin = math:sin(Angle),
X = X0 - Cx,
Z = Z0 - Cz,
{X*Cos+Z*Sin+Cx,Y,Z*Cos-X*Sin+Cz}
end;

(BTW, comparison to CL also suggests another possible notation for
writing type guards: "when float(U, Min, X0, Y, Z0) ->", i.e. to make
the type guards n-ary.)

Cheers,
Luke
unknown
2003-09-19 20:53:06 UTC
Permalink
Post by unknown
On Thu, 18 Sep 2003 22:02:26 +0200
Post by unknown
[...] there was a philosophy that all guards should be executable in
constant time.
This philosophy doesn't work out IMHO.
Either, the guard is used as a selector. Then, if the guard is
disallowed, people are going to write the exactly same
non-constant-time code into the condition of an "if" statement, and
nothing is won. Or the guard is used as a precondition. In that case,
it shouldn't be a guard (but an assert statement or something -
particularly, something that can be switched off for production code).
Just my 2c.
Regards,
Jo
Sing it, brother!
1) the programmer writes whatever code they want in the "when" test
2) the compiler analyzes that code
3) whatever parts of it that have side-effects are rejected
How could this be done at compile time while still using run time
function call binding?
If I say erlc modfoo.erl and it uses a guard that references code in
modbar.erl erlc must be made aware of modbar.erl so it can look for a
"!" in that module.
Post by unknown
4) whatever parts of it are constant-time are used as the guard
5) whatever parts of it are non-constant-time are factored out into a
"case" statement
But that sort of requires that a) compiler-writers are well paid and b)
the language is optimized for maintainability over efficiency... which
only ever seem to hold true in my private little world.
-Chris
unknown
2003-09-20 19:25:44 UTC
Permalink
On 19 Sep 2003 15:53:06 -0500
Post by unknown
Post by unknown
1) the programmer writes whatever code they want in the "when" test
2) the compiler analyzes that code
3) whatever parts of it that have side-effects are rejected
How could this be done at compile time while still using run time
function call binding?
If I say erlc modfoo.erl and it uses a guard that references code in
modbar.erl erlc must be made aware of modbar.erl so it can look for a
"!" in that module.
Well I never said it would be easy :) The really tricky part is when
modbar.erl undergoes hot code upgrade.

But even if the functions in a guard were restricted to the current
module, I think it'd be a win for expressivity.

-Chris
unknown
2003-09-18 11:57:03 UTC
Permalink
Post by unknown
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
This looks really really nice!

Again, I'd like to ask for user-defined guards, as it would really enhance
... erlang(?).

/Fredrik
unknown
2003-09-18 13:16:48 UTC
Permalink
Post by unknown
This is a wonderful suggestion. I would welcome it. Furthermore I
would also welcome compile-time checking in cases where it is
practical. Such a small thing could be a stepping stone to some
ultimate type system.
Post by unknown
foo(A/integer, B/integer) ->
A * B.
What is the runtime expense of performing such checks?
It depends upon the set of patterns. If we have a polymorphic
function like this:

foo(A/integer, B/...) ->
foo(A/integer, B/...)
foo(true, ...) ->
foo(bar, ...) ->

Then any half decent pattern matching compiler could generate
pattern matching code like this....

switchOnTag Arg0 {
case integer: ...
case atom: ...
}

But with no type information the pattern matching code might be far
worse.

My general feeling is that "the more you tell the compiler about the
program the better"

As a programmer you shouldn't think about how to write patterns, or
guards for efficiency - let the pattern matching compiler do that.

If you *do* make assumptions about this your assumptions may well
turn out wrong in the next version of the system, and you
optimisations might turn into pessimisations


/Joe
Post by unknown
The technique I employ is liberal use of unit tests so I can avoid type
checking at runtime and gain a slight performance advantage. But
perhaps this is an illusion? What is the real expense?
Would this lead the way to a user-defined type system some day, one that
even supports recursive type definitions?
-type (blammo, [atom() | blammo()]).
foo (A/blammo, B/blammo) ->
blammize (A, B).
Maybe I've been hanging around ObjectiveCAML too much. M. Logan
probably thinks I'm crazy. ;-)
-e
unknown
2003-09-18 13:51:50 UTC
Permalink
Post by unknown
% The type constraint expressed directly in the argument list
% Shorter to write
% Exactly the same semantics as the foo example above
% The compiler has potential of handling these type constraints more
efficiently than today
% and this syntax makes the different type of guards more visible
%
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
The rationale behind this suggestion is that we already have the
<<Size,B:Size/binary,Rest/binary>> = <<2,"AB","CD">>)
It is unfair that some types can be matched without guard syntax while
others can't.
It will result in a more compact and intuitive way of writing that will
encourage writing type constraints which will catch errors as early and
efficient as possible. The allowed type specifiers should be all the
type test BIFs named is_XXX/1 but without the "is_" prefix.
Adding guards like this violates one of my favorite design
meta-principles:

" If you add something to a language you have to chuck something
away "

What are we going to chuck away?

Well not "when ->" ...

Since you can't say

foo(X, Y) when X == 2*Y ->

in the new syntax

So you keep "when" - now there are TWO ways of saying things

foo(A/integer, B/integer)

or

foo(A, B) when is_integer(A), is_integer(B) ->

and people will write:

foo(A/integer, B) when is_integer(B) ->
...

etc.

And the integer guard test is proposed to be called integer in one place
and is_integer somewhere else

Adding the new syntax without removing something will *increase* the
complexity of the language. Since the change adds no semantic power
I am against it.

Now if you want to add something add "proper" structs and throw away
records and include files with record headers.

Meta comment - Erlang is becoming more and more complicated.

Can we please start checking stuff away to make the language simple.

To hasten this process I'd like to add a new declaration to the
the language.

All modules should start.

-needsErlangVersion(N).

Where N is the language version.

Then when you add/remove language features you will know what
version of the compiler handles what feature. There is nothing worse
than writing some code using the "latest" version of the system and
posting the code to somebody using an earlier version just to be told
that "the program doesn't work".

Suppose:

1) I write a program using version (say) 5 of the system,
which has some feature not present in system version 4.
2) I post the program to the net
3) Somebody running version 4 tries running my program

When they do 3 they should get a *helpful* message -

You need version 5 of the system -
do you want me to live-update your system: [y]

They type y - and the system gets the latest version of the system
and installs it for them :-)

What I do not want to happen if for either the compilation of execution
to fail and for them to send me a bug report - this has happened *many* times.

Then once we have got the versioning in we can start chucking things
away.

Get rid of spawn(Mod,Func,Args) ... and all the stupid
"having to export spawned functions" these are hangbacks to the days before
funs. You only need spawn(fun() -> ...)

Keeping the system backwards compatible builds up *enormous*
problems in the future - by all means add things - but throw away the old
junk at the same time.

If you number the versions with needsErlangVersion(N) you will even be
able to semi-automate the transition from one version to another.

Without knowing the language version this is difficult

/Cheers

Joe
unknown
2003-09-18 14:16:12 UTC
Permalink
Post by unknown
Adding guards like this violates one of my favorite design
meta-principles: " If you add something to a language you have to chuck
something
Post by unknown
away "
What are we going to chuck away? Well not "when ->" ... Since you can't
say
Post by unknown
foo(X, Y) when X == 2*Y ->
in the new syntax
So you keep "when" - now there are TWO ways of saying things
foo(A/integer, B/integer)
or
foo(A, B) when is_integer(A), is_integer(B) ->
foo(A/integer, B) when is_integer(B) ->
...
etc.
Why not throw away the "when is_integer()" guards? Let type be only be specified
using the new syntax, and let other kind of guards be specified as before. It
seems to me it should work. What do you think?

Of course, this would break backwards compatibility. Since I'm not at Ericsson
(anymore) I can happily propose to write a tool for automated conversion of code
to the new format. :-)

regards,
Vlad
unknown
2003-09-18 15:09:21 UTC
Permalink
On Thu, 18 Sep 2003 16:16:12 +0200
Post by unknown
Post by unknown
Adding guards like this violates one of my favorite
design
meta-principles: " If you add something to a language you have to chuck
something
Post by unknown
away "
What are we going to chuck away? Well not "when ->" ... Since you can't
say
Post by unknown
foo(X, Y) when X == 2*Y ->
in the new syntax
So you keep "when" - now there are TWO ways of saying things
foo(A/integer, B/integer)
or
foo(A, B) when is_integer(A), is_integer(B) ->
foo(A/integer, B) when is_integer(B) ->
...
etc.
Why not throw away the "when is_integer()" guards? Let type be only be
specified using the new syntax, and let other kind of guards be
specified as before. It seems to me it should work. What do you think?
I think you CAN throw "when" completely away IF you take this new syntax
proposal to its logical extreme.

foo(X, Y) when X == 2*Y

becomes

foo(X == 2 * Y)

This could also handle the other cases currently handled by "when" in a
much more readable fashion, viz:

foo(X > 3, Y > 3)
foo(1 < X < 10, is_integer(Y))
foo(X * Y > 100, X < Y)

As you can maybe see, the rules change - no longer is it an argument
list in the same way. The "," symbol takes on the meaning of "and", and
the order that variables are mentioned establishes their bindings (so
all the above examples would bind the first argument to X and the second
to Y, even though sometimes X is mentioned more than once, sometimes
there is no "," at all, etc.)

But this would completely obviate all need for "when" clauses, AFAICS.

What you would probably have to throw away would be easy
constant-matching - you can't have a function head like

foo(100, 200) ->

anymore, because it doesn't "mean" anything; you have to either say

foo(X == 100, Y == 200)

or write a REALLY clever (probably detrimentally so) compiler.
Post by unknown
Of course, this would break backwards compatibility.
Throwing stuff out tends to do that... :)

-Chris
unknown
2003-09-18 21:51:07 UTC
Permalink
Post by unknown
I think you CAN throw "when" completely away IF you take this new syntax
proposal to its logical extreme.
foo(X, Y) when X == 2*Y
becomes
foo(X == 2 * Y)
Bad, uncomprehensible.

I don't see the point in this new propasal at all.


/klacke
--
Claes Wikstrom -- Caps lock is nowhere and
Alteon WebSystems -- everything is under control
http://www.bluetail.com/~klacke
cellphone: +46 70 2097763
unknown
2003-09-19 17:35:27 UTC
Permalink
On Thu, 18 Sep 2003 23:51:07 +0200
Post by unknown
Post by unknown
I think you CAN throw "when" completely away IF you take this new
syntax proposal to its logical extreme.
foo(X, Y) when X == 2*Y
becomes
foo(X == 2 * Y)
Bad, uncomprehensible.
I don't see the point in this new propasal at all.
Brilliantly argued, sir! :)

To clarify my position: I submit that

foo(X > Y) ->

contains the exact same amount of information as

foo(X, Y) when X > Y ->

while being more concise.

I also don't think it's any more difficult to read, once you unlearn the
arbitrary convention that "comma seperates arguments".

(Not that I'm proposing it for Erlang, since such a huge change would be
lethal to backwards-compatibility. But as a language design idea in its
own right, I like it.)

-Chris
unknown
2003-09-18 14:46:55 UTC
Permalink
Joe Armstrong wrote:
...deleted
Post by unknown
Adding guards like this violates one of my favorite design
" If you add something to a language you have to chuck something
away "
although i am very much in favour of this meta-principle (i got very
strong reactions when suggesting a removal before (as in: not at) the
scheme strawman workshop (ICFP '98)), it does not apply in this
particular case.

this is not adding something, instead it is extending an already
existing feature to be more orthogonal. it should make erlang easier to
learn. Var/binary is no longer a special case, but just another example
of Var/builtin_type.

imho.


on the other hand, having both
foo(A, B) when integer(A), integer(B) ->
and
foo(A, B) when is_integer(A) and is_integer(B) ->
is a very strong violation of the meta-principle. it should be dealt
with asap.


bengt
unknown
2003-09-19 06:40:07 UTC
Permalink
Post by unknown
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
***

I think records should also be encompassed by this notation. But how to extend
it to allow specification of a record name?

foo(A/record#myRecord) ->
foo(A/#myRecord) ->

First one is more consistent with the parallel to the is_XXX bifs, but longer.

***
Post by unknown
Inability to comprehend the value that this syntax would add makes me think
that the whole thing is about ideology and programming puritanism.
As I see it, it's only syntactic sugar. The value is most visible when the only
guards are type declarations, and when there are many parameters with longish
names.

myFun(FirstArgument, SecondArgument, ThirdArgument)
when is_integer(FirstArgument),
is_float(SecondArgument),
is_list(ThirdArgument) ->

myFun(FirstArgument/integer, SecondArgument/float, ThirdArgument/list) ->

regards,
/Vlad
unknown
2003-09-19 06:57:29 UTC
Permalink
Hi Joe and the Gurus,
Post by unknown
Adding guards like this violates one of my favorite design
" If you add something to a language you have to chuck something
away "
What are we going to chuck away?
Well not "when ->" ...
Maybe I just weird, but I don't find "when ->" offensive in the least.
What's fundamentally wrong with it? It does a lot that can't be done
otherwise, and seems to me to be othogonal to other language features.

Syntactic sugar only makes a language fat.
Post by unknown
Can we please start checking stuff away to make the language simple.
In this case, may I propose that "if" and "case" were stuffed into the
blender? They're _not_ orthogonal to other features!

I cannot find any solid argument toward keeping and using them when
writing equivalent code using functions alone makes (at least my) code
much less prone to the Dreaded Indent pushing against the right margin.
Not mentioning that the resulting function names can be well chosen too.
Post by unknown
To hasten this process I'd like to add a new declaration to the
the language.
All modules should start.
-needsErlangVersion(N).
Where N is the language version.
Yes. I like this idea. In its absence, does it default to the version of
todays Erlang?
Post by unknown
They type y - and the system gets the latest version of the system
and installs it for them :-)
That sounds horribly like Windows Update! Surely we can assume that
users know where to get the latest version of Erlang? Or am I being
presumtious?
Post by unknown
Then once we have got the versioning in we can start chucking things
away.
...
Post by unknown
You only need spawn(fun() -> ...)
As I discovered recently, it's much cleaner. Yes, I'm a slow learner,
(-: but I haven't had the benefit of a good training course )-: Bloody
finances!

Pete.
unknown
2003-09-19 07:53:48 UTC
Permalink
} Not mentioning that the resulting function names can be well chosen too.

I get stuck thinking about what to name things. foo2, foo3, foo4 are
easy enough but anything else leaves me idle pondering how a name should
be chosen. I like to use funs because I don't need to name them or if I
do I can call them F or Fun.

-Vance

On Fri, Sep 19, 2003 at 07:57:29AM +0100, Peter-Henry Mander wrote:
}
} In this case, may I propose that "if" and "case" were stuffed into the
} blender? They're _not_ orthogonal to other features!
}
} I cannot find any solid argument toward keeping and using them when
} writing equivalent code using functions alone makes (at least my) code
} much less prone to the Dreaded Indent pushing against the right margin.
} Not mentioning that the resulting function names can be well chosen too.
unknown
2003-09-18 13:57:55 UTC
Permalink
Post by unknown
Post by unknown
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
This looks really really nice!
Again, I'd like to ask for user-defined guards, as it would really enhance
... erlang(?).
No - guards are extensions of patterns. They have to be simple,
so that a pattern matching compiler can do a good job on them.

If we had user defined guards somebody will write:

foo(X) when bar(X), a == b ->
...

Where bar(X) a user defined guard.

Now what happens if somebody writes:

bar(P) -> P ! a, true.

But the guard fails (ie a==b) fails, did the message get sent?????

Cheers

/Joe
unknown
2003-09-19 15:59:26 UTC
Permalink
Erlang and concurrent programming relies on good programmer practice to
create reliable, comprehensible programs. Many of these idioms are
embodied in OTP. Still one can choose to use OTP or not, and if not
still write reasonable programs. Some things are too scary to leave up
to idiom and the good conscience of any erlang programmer. While more
type checking would doubtless provide some benefit to the programmer
user defined guards at the same time opens the door for real
inconsistency and uncertainty if one is not well informed and
painstaking in writing them.

I call this with func(d, Arg).

func(a, Arg) when user_side_effect(Arg) -> a;
func(b, Arg) when user_side_effect(Arg) -> b;
func(c, Arg) when user_side_effect(Arg) -> c;
func(_, Arg) when user_side_effect(Arg) -> error.

I assume that only the fourth function will ever be executed. I
consequently assume the expressions associated with expression 4 is the
only expression having the potential to influence the state or
*significant* timing of my system. This assumption is based on the
assumption that the people writing guard expressions are following the
strict rules associated with writing such an expression. This would all
go out the window if guards are not considered sacred anymore.

If user defined guards were to be allowed some other language/context
would have to be devised to write them. This language would provide the
necessary safeguards - like no messaging or "put" calls and such.
Basically it would not allow much past a basic declarative syntax for
defining the structure of a thing and perhaps some ability to define
these things recursively. Think we've seen this before in many other
type systems. That is fine for erlang if anyone could ever implement it
without crippling what is erlang. Having users create there own type
system, in essence, for every function(user defined guards) seems like a
no no.

Cheers,
Martin
Post by unknown
Post by unknown
Post by unknown
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
This looks really really nice!
Again, I'd like to ask for user-defined guards, as it would really enhance
... erlang(?).
No - guards are extensions of patterns. They have to be simple,
so that a pattern matching compiler can do a good job on them.
foo(X) when bar(X), a == b ->
...
Where bar(X) a user defined guard.
bar(P) -> P ! a, true.
But the guard fails (ie a==b) fails, did the message get sent?????
Cheers
/Joe
unknown
2003-09-18 14:13:11 UTC
Permalink
-----Original Message-----
From: Joe Armstrong [mailto:joe]
Sent: den 18 september 2003 15:58
To: Fredrik Linder
Cc: erlang-questions
Subject: RE: Enhanced type guard syntax]
Post by unknown
Post by unknown
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
This looks really really nice!
Again, I'd like to ask for user-defined guards, as it would
really enhance
Post by unknown
... erlang(?).
No - guards are extensions of patterns. They have to be simple,
so that a pattern matching compiler can do a good job on them.
foo(X) when bar(X), a == b ->
...
Where bar(X) a user defined guard.
bar(P) -> P ! a, true.
But the guard fails (ie a==b) fails, did the message get sent?????
Cheers
/Joe
Yes, you are right in the sense that code that utilize distribution should
not be allowed as guard functions, neither should those utilizing
"permanent" storage, such as accessing ets tables and files.

And I do realize the potential overwelming task of keeping track of these
things.

However, by allowing only a limited set of operations in guard-declared
functions (such as only allowing then to call other guard-functions) could
be the fine line that increases usability while still preserving the
consistency of the program.

The usage I am targeting is to have user-defined guards as
inspectors/selectors on ADTs.

Cheers
/Fredrik
unknown
2003-09-18 14:34:53 UTC
Permalink
Post by unknown
Post by unknown
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
This looks really really nice!
Actually, I've changed my mind... <:-) A little too fast there, I was. I
think it reduces readability, so I vote against. :-) I like the separation
the when-clause provides.

Cheers
/Fredrik
unknown
2003-09-18 18:45:29 UTC
Permalink
Sometimes it might be useful to be able to write:

isOne( X ) when X == 1; X == "one" -> true;
isOne( X ) -> false.

Why would I ever want to (re)write it like this:

isOne( X/integer ) ->
case X of
1 -> true;
_ -> false
end;

isOne( X/string ) ->
case X of
"one" -> true;
_ -> false
end.

Most of the languages are either STRONGLY or weakly typed.
Erlang can be both and more -- it can match patterns that can
be specified/grouped through their semantics, not only syntax.
For example, when we say ONE, we can write it as "1" or
"one" -- it wouldn't change the meaning, right?


Valentin.
unknown
2003-09-18 19:13:47 UTC
Permalink
Hi,
Post by unknown
isOne( X ) when X == 1; X == "one" -> true;
isOne( X ) -> false.
Why not

isOne(1) -> true;
isOne("one") ->true;
% or maybe even
% isOne("one")->isOne(1);
isOne(_) ->false.

? Looks simpler to me ;-)
Post by unknown
Most of the languages are either STRONGLY or weakly typed.
Erlang can be both and more -- it can match patterns that can
be specified/grouped through their semantics, not only syntax.
For example, when we say ONE, we can write it as "1" or
"one" -- it wouldn't change the meaning, right?
In your example, the guards X==1 and X=="one" are not typing guards, but
matching ones. Thus according to the proposal, they would remain valid.

regards,
Vlad
unknown
2003-09-18 20:00:34 UTC
Permalink
...packages...p/arametrized modules./..new guard syntax..

Perhaps it is time to remove the cobweb from the good old
discussion topic of creating a new language 'Erlang2' (i.e
Erlang done right). Perhaps a little switch to the compiler,
in a transition phase + some automated conversion tool:
Erlang -> E2 ...etc...

Cheers , Tobbe
unknown
2003-09-18 20:23:45 UTC
Permalink
Post by unknown
For example, when we say ONE, we can write it as "1" or
"one" -- it wouldn't change the meaning, right?
<aside note>
Er... I wouldn't consider that Good Style (an interface shouldn't accept
inputs that are so ill-defined as natural language), but of course there
are similar and worse interface in practice. (Heck, I have been forced
to do some interfaces of this kind myself.)
For example, if a library has lived long enough, it will accept inputs
of "modern" and of "ancient" style, which tend to be quite different
data structures.
</aside note>

I think two things are getting conflated here: guards and preconditions.

A guard is a kind of selector: the first implementation that has guards
that match a given set of parameters is chosen. (In OO, all guards are
on "typeof (first argument) <= typeof (first parameter)", so, in a
sense, Erlang is indeed OO, but not on the message level *grin*.)
If a given function has the wrong guards, the system will try another
one, it will not fail.

A precondition is unrelated to implementation. If the precondition
fails, the caller has done something evil. This is things like calling
sqrt with a negative argument, or, say, bank transfers with identical
source and destination accounts. Any routine added that has a matching
guard is simply wrong, it violates the contract that's supposed to go
with the function of that description.

Preconditions are valuable. Both as a kind of built-in unit test, and as
a documentation aid.
For testing, have the system compile the stuff with precondition
checking enabled. For extra convenience, have the system compile them in
anyway, and have them activated and deactivated by the run-time system.
For generous convenience, have them activated and deactivated on a
per-module or even per-function basis :-) (oh, and activate and
deactivate them in the *calling* places, you usually want to debug the
caller if you're checking precondition violations *g*)
For documentation, it's simply both more precise and concise to have
sqrt (X >= 0.0)
instead of the verbose and natural-language all-too-often-fuzzy
sqrt (X)
%% X must be >= 0.0


Syntax ideas (just thinking loudly):

Ideally, one would use something like
function (X/guard)
where "guard" is a fun() that accepts a single parameter, and returns
True or False.
For example, this would allow one to write stuff like
is_real (X) and X >= 0.0
Unfortunately, taking out the X and getting this into a fun() is going
to be *very* ugly. Too much syntactic cruft around here.
Haskell solves the issue far more elegantly IMHO: any "incomplete"
expression (which is an expression missing its last parameter) is
automatically a fun(). The above as a fun() would then look like this:
function (X / is_real and ('>=' 0))
where the constituents are:
'>=' is the ordinary >= operator, just as a function of two parameters
(i.e. you'd use this as '>=' 0 X to mean "0 >= X"); this is just
syntactic sugar to convert the >= symbol from an operator into a
function name.
'>=' is a function of two parameters. In the above, the first parameter
is given, yielding the function ('>=' 0), which is the function that
returns True if its only parameter is greater than or equal to zero.
is_real is just the name of a function, a one-parameter function that
takes a single parameter.
"and" is actually and/3, not the ordinary and/2. It take two
single-parameter functions and a value; when called, it feeds the value
to the functions and returns the logical AND of the two function result.
(I'm not sure whether the compiler can be made to recognize that the
"and" in the "is_real and ('>=' 0)" bit actually is and/3. Maybe some
syntactic sugar is required. Well, I'm just thinking loudly anyway,
without any claims to relevance *g*.)

Hmm... no, it won't do. Erlang identifies functions by name and
parameter count. If I write "foo X", the compiler expects it to be a
call to foo/1, not a scantily clad foo/2 missing its second parameter.
We'd need something like an extra "dummy" parameter, something like
"foo X _".
In the above, this would be
function (X / is_real (_) and ('>=' 0 _))
or even
function (X / is_real (_) and 0 <= _)
which isn't much of an advantage over
function (X / is_real (X) and 0 <= X)
but remember that parameter names can be *much* longer than a single
character :-)


Anyway, on to another aspect: guards that involve multiple parameters.
For a contrived example, assume
distribute_percentages (X, Y, Z)
%% X + Y + Z must sum up to 100.
I'd prefer to write this as
distribute_percentages (
X / in_range (0, 100, _),
Y / in_range (0, 100, _),
Z / in_range (0, 100, _) and X + Y + Z = 100.0)
(See how the formalization forced me to become quite explicit about what
a "percentage is?)


Just my 2c.

Regards,
Jo
unknown
2003-09-18 15:12:38 UTC
Permalink
Post by unknown
Adding guards like this violates one of my favorite design
" If you add something to a language you have to chuck something
away "
What are we going to chuck away?
Well not "when ->" ...
Since you can't say
foo(X, Y) when X == 2*Y ->
in the new syntax
Adding the new syntax without removing something will *increase* the
complexity of the language. Since the change adds no semantic power
I am against it.
In principle, I agree. However, one big problem with the 'when... ->'
syntax is that it's exceedingly ugly (well, at least exceedingly
cumbersome.)

The following code snippet from wings_deform.erl:

twist_fun(y, {Cx,_,Cz}) ->
fun(U, Min, {X0,Y,Z0})
when float(U), float(Min), float(X0), float(Y), float(Z0) ->
Angle = U*(Y-Min),
Cos = math:cos(Angle),
Sin = math:sin(Angle),
X = X0 - Cx,
Z = Z0 - Cz,
{X*Cos+Z*Sin+Cx,Y,Z*Cos-X*Sin+Cz}
end;

would be possible to write as:

twist_fun(y, {Cx,_,Cz}) ->
fun(U/float, Min/float, {X0/float, Y/float, Z0/float}) ->
Angle = U*(Y-Min),
Cos = math:cos(Angle),
Sin = math:sin(Angle),
X = X0 - Cx,
Z = Z0 - Cz
{X*Cos+Z*Sin+Cx,Y,Z*Cos-X*Sin+Cz}
end;

which I personally think is a significant face lift.
Note for example that the complete function head in the
second version is shorter than the guard declaration alone
in the first version. That, and the first version forces the
reader to jump back and forth in order to match the guards
with the arguments.

In this particular code (I believe), type guards are important
from a performance perspective, since floating point operations
are performed more efficiently if the compiler can _know_ that
the input parameters are really floats.

In general, type guards should be used more because they add
safety and clarity. Unfortunately, with today's guard syntax,
they are unnecessarily clumsy, and can actually serve to
obfuscate the code rather than clarifying it.

Also, it's consistent with the bit syntax, so the X/Type
syntax has already been added to the language.

For these reasons, I support the proposal.

/Uffe
unknown
2003-09-19 09:07:40 UTC
Permalink
Someone (I think it was Joe) said that if you add something to a language
you should also be prepared to take something away. I.e. you don't want to
let the language become a monster like PL1 (or Perl). So, if we were to
add Kenneth's syntax, what will be take remove?

/mike
unknown
2003-09-22 05:18:08 UTC
Permalink
Joe Armstrong <joe> wrote:
It depends upon the set of patterns.
If we have a polymorphic function like this:

foo(A/integer, B/...) ->
foo(A/integer, B/...)
foo(true, ...) ->
foo(bar, ...) ->

Then any half decent pattern matching compiler could generate
pattern matching code like this....

switchOnTag Arg0 {
case integer: ...
case atom: ...
}

But any half-way decent pattern-matching compiler *should* be generating
pattern-matching code like that anyway for

foo(A, B) when integer(A), ... ->
foo(A, B) when integer(A), ... ->
foo(true, ...) when ... ->
foo(bar, ...) when ... ->

Someone under Udi Shapiro's supervision wrote a compiler for FCP that did
that years back. I *think* the name was Shmuel Kliger, but I am probably
wrong, and I *think* the year I read it was 1990, but ditto. And of course
I may be misremembering it.

Basic ideas:
(1) Compile pattern matches and guards together
(2) Do guard tests when the data are known to be available
(for Prolog, a variable may be tested after the _last_ mention of
it in a pattern; for Erlang it may be tested after the _first_
mention). So integer(A) may be scheduled at any point after the
first mention of A in the head, A > B may be done at any point
after A and B both become available, and so on.
(3) Introduce the generalisations of a test:
integer(A) => atomic(A) & number(A) & integer(A)
A = 27 => integer(A).... & A = 27
A = true => atomic(A) & atom(A) & A = true
hd(X) > 2 => is_pair(X) & [H|_] = X & atomic(H) & number(H) & H > 2
...
(4) If a test is implied by tests you have already done, emit no code
for it.
(5) When building the tree, at each node pick the most useful test.
I've forgotten the exact rule, but it's something about filtering
out the largest number of clauses, and something about expense.

My general feeling is that "the more you tell the compiler about the
program the better"

As a programmer you shouldn't think about how to write patterns, or
guards for efficiency - let the pattern matching compiler do that.

Right. If Erlang _were_ to be extended with IBM-PROLOG-like type tests
in patterns (*YUCK*) then it should make absolutely no difference to the
compiler whether you do it that way or not.
unknown
2003-09-22 05:21:40 UTC
Permalink
"Fredrik Linder" <fredrik.linder> However, by allowing only a limited set of operations in guard-declared
functions (such as only allowing then to call other guard-functions) could
be the fine line that increases usability while still preserving the
consistency of the program.

The usage I am targeting is to have user-defined guards as
inspectors/selectors on ADTs.

Except for the unnecessarily ugly syntax and not being quite as fully
worked out, how is this different from my old "Abstract Patterns" proposal?
unknown
2003-09-22 05:38:20 UTC
Permalink
Ulf_Wiger <ulf.wiger> wrote:

The following code snippet from wings_deform.erl:

twist_fun(y, {Cx,_,Cz}) ->
fun(U, Min, {X0,Y,Z0})
when float(U), float(Min), float(X0), float(Y), float(Z0) ->
Angle = U*(Y-Min),
Cos = math:cos(Angle),
Sin = math:sin(Angle),
X = X0 - Cx,
Z = Z0 - Cz,
{X*Cos+Z*Sin+Cx,Y,Z*Cos-X*Sin+Cz}
end;

I have to ask "what are those float/1 tests doing there at all?"
Surely the normal use of guards is to select between alternatives?
There is no alternative here.
All that these guards are doing is turning _meaningful_ calls
(like F(0,0,{0,0,0}) into errors. What _would_ be useful from an
error-detection point of view would be to check that Cx and Cz are
numbers, but that is _not_ done.

Surely the best way to simplify that code is

twist_fun(y, {Cx,_,Cz}) ->
% maybe check Cx, Cz are numbers before it's too late?
fun(U, Min, {X0,Y,Z0}) ->
Angle = U*(Y-Min),
Cos = math:cos(Angle),
Sin = math:sin(Angle),
X = X0 - Cx,
Z = Z0 - Cz,
{X*Cos+Z*Sin+Cx,Y,Z*Cos-X*Sin+Cz}
end;

In this particular code (I believe), type guards are important
from a performance perspective, since floating point operations
are performed more efficiently if the compiler can _know_ that
the input parameters are really floats.

It's not clear to me that tweaking performance in very minor ways
at the price of turning meaningful calls into errors is a good way for
Erlang to go. Note that with the original code the compiler does NOT
know that Cx and Cz are floats (or even that they are numbers) so that
the last three lines of calculations have to be done in a general way
anyway, and that the cost is likely to be dominated by the calls to
math:cos/1 and math:sin/1.

Is there a way you can get the speed benefit of knowing that something
is a float _without_ going out of your way to lock out meaningful calls
and _without_ any language changes? Yes:

twist_fun(y, {Cx,_,Cz}) ->
Cxf = float(Cx), % check numeric & convert now
Czf = float(Cz), % check numeric & convert now
fun(U, Min, {X0,Y,Z0}) ->
Angle = float(U)*(float(Y) - float(Min)),
Cos = math:cos(Angle),
Sin = math:sin(Angle),
X = float(X0) - Cxf,
Z = float(Z0) - Czf,
{X*Cos+Z*Sin+Cxf, Y, Z*Cos-X*Sin+Czf}
end;

Now the compiler does know what is going on, _and_ other numeric
types can be passed in. There's no extra cost for arguments that _are_
floats, because the check had to be done anyway.

Also, it's consistent with the bit syntax, so the X/Type
syntax has already been added to the language.

I hadn't really regarded those things as types but as formats;
possibly because I thought of them as related to Perl's "pack" and
"unpack".
unknown
2003-09-22 06:13:08 UTC
Permalink
Concerning the examples:

foo(X == 2 * Y)
% foo(X, Y) when X == 2*Y
foo(X > Y)
% foo(X, Y) when X > Y

am I the only one who is bothered (to put it politely) that the
"improved" versions _look_ as if they have one argument, but are
supposedly equivalent to functions with _two_ arguments, and that
there's no obvious way to tell which argument is which?

For some reason, Haskell programmers don't seem to might writing

foo x y | x == 2*y =
...
foo x y | x < y =
...

"Chris" wrote:

I also don't think it's any more difficult to read, once you
unlearn the arbitrary convention that "comma seperates arguments".

It may be arbitrary, but so are all human communication conventions,
and this one has been going strong in programming languages for 50 years.
"As a language design idea in its own right", I very much dislike a
notation where I can't tell which argument is which.

Now let's consider

f([X|_], [Y|_]) when X < Y -> ...

how do you write _that_ without commas? Good ideas (like abstract patterns)
work in general cases.
unknown
2003-09-22 07:29:05 UTC
Permalink
Hi Gurus,

After spending a weekend thinking about totally unrelated things, and
now that my mind is somewhat refreshed, my opinion is _NO_, please don't
change the notation. The only "gain" is a change in syntax. I'm
perfectly happy to read the "English" version.

As for optimisation, where I've needed high speed bit-crunching I've
resorted to C and C-node servers. This kinda dodges the
language-improvement issue, but I'm simply being pragmatic, and it has
worked very well for what I've done. Speed in Erlang isn't a big issue
for me.

As for type-checking and code-correctness a code analyser would be a
wonderfully powerful design tool. Erlang already has the remarkable
quality of expressing algorithms clearly. I make lots of syntactic
errors when writing (wither semicolon?), but once I've cleaned them up,
most of the time the code semantics are correct and works as desired. Yay!

But in practice I've rarely had such obscure problems as "how did an
atom end up being used as a number?". Maybe the excellent error trapping
and recovery mechanisms in Erlang could be to "blame" for this? I found
debugging Erlang code is relatively easy, because the error reports are
exact and meaningful, and the programs don't go quietly mad before
failing miles down the execution path.

Pete.
Post by unknown
foo(X == 2 * Y)
% foo(X, Y) when X == 2*Y
foo(X > Y)
% foo(X, Y) when X > Y
am I the only one who is bothered (to put it politely) that the
"improved" versions _look_ as if they have one argument, but are
supposedly equivalent to functions with _two_ arguments, and that
there's no obvious way to tell which argument is which?
For some reason, Haskell programmers don't seem to might writing
foo x y | x == 2*y =
...
foo x y | x < y =
...
I also don't think it's any more difficult to read, once you
unlearn the arbitrary convention that "comma seperates arguments".
It may be arbitrary, but so are all human communication conventions,
and this one has been going strong in programming languages for 50 years.
"As a language design idea in its own right", I very much dislike a
notation where I can't tell which argument is which.
Now let's consider
f([X|_], [Y|_]) when X < Y -> ...
how do you write _that_ without commas? Good ideas (like abstract patterns)
work in general cases.
unknown
2003-09-22 14:54:58 UTC
Permalink
On Mon, 22 Sep 2003 18:13:08 +1200 (NZST)
Post by unknown
foo(X == 2 * Y)
% foo(X, Y) when X == 2*Y
foo(X > Y)
% foo(X, Y) when X > Y
am I the only one who is bothered (to put it politely) that the
"improved" versions _look_ as if they have one argument, but are
supposedly equivalent to functions with _two_ arguments, and that
there's no obvious way to tell which argument is which?
No, it also seemed to bother Klacke very much :)

But what could be a more obvious way to tell which argument is which,
than their order? in "X > Y", X appears first, so it's the first
argument.
Post by unknown
For some reason, Haskell programmers don't seem to might writing
foo x y | x == 2*y =
...
foo x y | x < y =
...
I also don't think it's any more difficult to read, once you
unlearn the arbitrary convention that "comma seperates
arguments".
It may be arbitrary, but so are all human communication conventions,
and this one has been going strong in programming languages for 50
years."As a language design idea in its own right", I very much
dislike a notation where I can't tell which argument is which.
Stick-in-the-mud! You're just jealous 'cuz I'm an "early adopter." :)

Seriously, you have every right to dislike it as much or as little as
you like since aesthetics are subjective and it's just another
variation on notation. (That's probably the main reason people cringe
at Forth and Lisp code too, it's "impossible to read" (until you learn
to read it :))

But I don't think you can show that this notation is not more concise.
Which, I believe, was the main thrust of the original thread (that
A/integer is more concise than (A) when is_integer(A).)
Post by unknown
Now let's consider
f([X|_], [Y|_]) when X < Y -> ...
how do you write _that_ without commas?
You don't, because I didn't say "never use commas," I said "commas
become 'and's" (or something very close to 'and') You write it like
this:

f([X | _], [Y | _], X < Y) ->

-Chris
unknown
2003-09-22 08:40:46 UTC
Permalink
Post by unknown
"Fredrik Linder" <fredrik.linder> However, by allowing
only a limited set of operations in guard-declared
functions (such as only allowing then to call other guard-functions)
could
be the fine line that increases usability while still preserving the
consistency of the program.
The usage I am targeting is to have user-defined guards as
inspectors/selectors on ADTs.
Except for the unnecessarily ugly syntax and not being quite as fully
worked out, how is this different from my old "Abstract Patterns"
proposal?
I can't seem to find this proposal online. Link anyone?

Sean
unknown
2003-09-22 15:05:42 UTC
Permalink
Post by unknown
Post by unknown
Except for the unnecessarily ugly syntax and not being quite as fully
worked out, how is this different from my old "Abstract Patterns" proposal?
I can't seem to find this proposal online. Link anyone?
I have a copy. I could post it on the list, if Richard does not have
any objections.


Sven-Olof
unknown
2003-09-23 09:17:57 UTC
Permalink
Post by unknown
Post by unknown
I can't seem to find this proposal online. Link anyone?
I have a copy. I could post it on the list, if Richard does not have
any objections.
See below.

Sven-Olof





Pattern Abstraction

Richard O'Keefe


There is one serious defect which Prolog has never overcome, although
successor languages such as Mercury have addressed it. It is a defect
which languages such as SML, Miranda, Haskell, and Clean do not have,
although it is shared by Scheme and Lisp generally.

The defect is that Prolog has no support for abstract data types. In
fact, Prolog pretty well goes out of its way to discourage abstract data
types. Everything in Prolog is a term, and a term is either a variable
or a functor applied to arguments which are themselves terms. Not only
are there primitive operations (var/1, nonvar/1, functor/3, arg/3) which
can be used to compose and decompose any data structure whatever, so
that abstraction cannot be enforced, Prolog's normal mode of dealing
with data is pattern matching, which means that abstraction is
penalised.

The classic example is list processing. One can quite easily set up an
abstract data type list(T) with operations

null([]).

pair([H|T], H, T).

and then write

append(X, Y, Y) :- null(X).
append(X, Y, Z) :- pair(X, H,U), pair(Z, H,V), append(U, Y, V).

instead of

append([], Y, Y).
append([H|X1], Y, [H|Z1]) :- append(X1, Y, Z1).

but then one has not merely the overhead of calling extra predicates,
but the rather disastrous loss of first-argument indexing. All Prolog
really needs to solve this problem is a form of predicate definition
that is guaranteed to be expanded inline, but this is not one of the
matters that the ISO committee chose to address (not the least of the
reasons why I have harsh words to say about that committee and that
standard). Another thing that would restore indexing would be to use
negative clauses such as

<- null(L), pair(L, _, _).

to declare that null/1 and pair/3 are mutually exclusive. SB-Prolog
has for a long time used such knowledge about built-in predicates
(notably arithmetic comparison and term comparison) to allow a programmer
to write e.g.

max(X, Y, X) :- X > Y.
max(X, Y, Y) :- X =< Y.

and get mutual exclusion of clauses without cuts. However, as Erlang
has the typical functional ``commit to first successful match'' rule,
it doesn't need negative clauses to get exclusion.

Erlang has Prolog-like problems with abstract data types. First, all
Erlang data structures are terms, built from variables, constants,
pairs, and tuples, and there are primitive features of the language for
composing and decomposing terms, so it is quite impossible to enforce
encapsulation, at least, not without imposing a type system. Second,
Erlang too normally operates on data structures by means of patterns,
which literally make the structure of data visible. Unlike Prolog,
Erlang does not rely on first-argument (or any other) indexing, but it
does not (and should not) allow general user-defined functions in
guards, so something like

append(X, Y) when empty(X) -> Y;
append(X, Y) when {H,T} = pair(X) -> make_pair(H, append(T,Y)).

would not be allowed. There are good reasons for this, and I do not
want to overturn this feature of Erlang. But we do need something to
put in its place if we are to support even weak data abstraction, and we
need data abstraction very much if we are going to have 10MSLOC programs
that we can hope to trust. (I note that a large chunk of New Zealand
was without emergency telephone service for several days in the week
just ending, due to a software problem. Probably written in C. (:-))

Note that the disallowed append/2 code I showed above has a rather
unpleasant property, which the inlined-simple-predicates approach in
Prolog is thankfully free of: composing a pair and decomposing a pair
use different forms. This increases the burden on the programmer.
There is no symmetry at all that can be exploited to make the
correspondence easier to remember.

There is another major property which a good solution to the problem
should have: it should be fully compatible with an entire absence of
any preprocessor. At the end of this note I shall explain what is badly
wrong with records in Erlang.

The pattern abstraction proposal is quite simple. There are three ways
an abstract pattern can be used:
- to compose a term
- to decompose a term
- to update a term (that is, compose a new term similar to an old one with
some differences.
A pattern definition is basically a restricted kind of function
definition, which is rewritten to define a composition function, a
decomposition function, and an update function. Each of the three ways
of using an abstract pattern is defined by rewriting to a call to one of
the three derived functions.

Abstract patterns begin with a sharp, like records. Unlike records,
they use round parentheses instead of curly braces. Also unlike
records, they are executable code, not a kind of macro, so they may come
from modules other than the lexically enclosing one. A pattern
definition is a restricted function definition whose heads are decorated
with a sharp and whose bodies are patterns (possibly involving other
abstract patterns).

definition +:= patclause {';' patclause}... '.'

patclause ::= '#' head ['when' guard] '->' pattern

pattern +:= '#' [module ':'] name '(' [patterns.] ')'

primary +:= '#' [module ':'] name '(' [expressions] ')'
| primary '#' [module ':'] name '(' [expressions] ')'


Here's append/2 using abstract patterns.

#empty() -> [].
#pair(H,T) -> [H|T].
#head(H) -> [H|_].
#tail(T) -> [_|T].

append1(#empty(), Y) -> Y;
append1(#pair(H,T), Y) -> #pair(H,append1(T,Y)).

append2(#empty(), Y) -> Y;
append2(#head(H)=#tail(T), Y) -> #pair(H,append2(T,Y)).

What does this mean? A pattern definition basically defines three
functions: a function to use for composition, a function to use for
decomposition, and a function to use for update. For the sake of
exposition, let's name the composition function for #x/n '#x'/n, the
decomposition function 'n=x'/1, and the update function '!x'/n+1.

'#empty'() -> []. % composition function
'0=empty'([]) -> {}. % decomposition function
'!empty'([]) -> []. % update function

'#pair'(H,T) -> [H|T].
'2=pair'([H|T]) -> {H,T}.
'!pair'([_|_], H,T) -> [H|T].

'#head'(H) -> error(). % useless composition function
'1=head'([H|_]) -> H.
'!head([_|T], H) -> [H|T].

'#tail'(T) -> error(). % useless composition function
'1=tail'([_|T]) -> T.
'!tail'([H|_], T) -> [H|T].

append1(X, Y) when {} = '0=empty'(X) -> Y;
append1(X, Y) when {H,T} = '2=pair'(X) ->
'#pair'(H, append1(T,Y)).

append2(X, Y) when {} = '0=empty'(X) -> Y;
append2(X, Y) when H = '1=head'(X), T = '1=tail'(Y) ->
'#pair'(H, append2(T,Y)).

After inline expansion, these two definitions would be identical to each
other and to the usual definition.

Suppose we are given a pattern definition

#p(P11,...,P1n) when G1 -> Q1;
...
#p(Pk1,...,Pkn) when Gk -> Qk.

The composition function is obtained very simply by turning #p into '#p'.

The decomposition function basically switches the left and right sides:

'n=p'(Q1) when G1 -> {P11,...,P1n};
...
'n=p'(Qk) when Gk -> {Pk1,...,Pkn}.

However, for the sake of field selection notation, and for efficiency,
the translation is simplified when n=1:

'1=p'(Q1) when G1 -> P11;
...
'1=p'(Qk) when Gk -> Pk1.

The update function has to copy the bits that aren't mentioned and
replace the bits that are. Define Q'i to be Qi with anonymous variables
replaced by new variables, Q"i and G"i to be Q'i and Gi with the
original named variables consistently replaced by new variables. Then
we get

'!p'(Q"1,P11,...,P1n) when G"1 -> Q'1;
...
'!p'(Q"k,Pk1,...,Pkn) when G"k -> Q'k.

All of these translations have to make sense. In particular:
1. every variable used in a guard must be defined in both the
argument patterns and the replacement pattern corresponding.
2a. If the composition function is to be usable, Qi must be fully
determined by Pi1...Pin and Gi, that is, have no variable,
anonymous or otherwise, that is not assigned a value by them.
2b. On the other hand, if an update function is to be useful, there
should be some anonymous variables in Qi, or else why not
just use a composition function?
3. Pij must be fully determined by Qi and Gi, that is, have no variable,
anonymous or otherwise, that is not assigned a value by them.

This actually turns out to be too strict. There is no point in having
an abstract pattern that can only be used for composition; we already
have perfectly good functions for that. But there is some sense in
having an abstract pattern that can only be used for decomposition.
Accordingly, if condition 2a is violated by the ith clause, that clause
of the composition function becomes

'#p'(Pi1,...,Pin) when Gi -> error(...)

When an abstract pattern is invoked as part of a pattern,

#[m:]p(P1,...,Pn) = E

it is as if

{P1,...,Pn} = [m:]'n=p'(E)

were invoked instead, if n~=1, or as if

P1 = [m:]'1=p'(E)

were invoked, if n=1. Since abstract patterns are state independent and
have no side effects, it doesn't matter whether the translation
preserves left to right order or not; suffice it that pattern match
translation can preserve left to right order if we want. If #p/1 is an
abstract pattern, it may also be invoked as a field reference:

E0#p

is as if

'1=p'(E0)

had appeared instead. This applies to guard expressions as well as
general expressions. Note that E#p is different from #p(E).

When an abstract pattern is invoked as part of an expression,

#[m:]p(E1,...,En)

it is as if

[m:]'#p'(E1,...,En)

were invoked instead. This clearly preserves the order and embedding of
expressions, so left to right execution order is not at all disturbed.

When an abstract pattern is invoked for update,

V = E0#[m:]p(E1,...,En)

it is as if

V = [m:]'!p'(E0, E1,...,En)

were invoked instead. The only way this could perturb left to right
ordering is if the module prefix involved an expression other than a
variable or atom, and in that case the translation

(ET = E0, V = m:'!p'(ET, E1, ..., En)

can be used (assuming that Erlang does or will allow arbitrary primaries
as module prefixes) where ET is a new variable. The preservation of
left to right order is why the `base datum' argument of an update
function is the first rather than the last.

There is one further condition. At the moment, if P is any pattern, and
P = P makes sense, (mainly, P contains no unbound variables), we expect
it to be true. But consider

#p(1) -> [];
#p(2) -> [].

Y = #p(2), #p(2) = Y

will fail, because it turns into

'#p'(1) -> []; '#p'(2) -> [].
'1=p'([]) -> 1; '1=p'([]) -> 2.
'!p'([],1) -> []; '!p'([],2) -> [].

Y = '#p'(2), 2 = '1=p'(Y)

The conditions we need are easy to state
4. There does not exist any term T such that
Qi = T, Gi ==> true
Qj = T, Gj ==> true
i ~= j.
5. There do not exist any terms T1,...,Tn such that
Pi1 = T1, ..., Pin = Tn, Gi ==> true
Pj1 = T1, ..., Pjn = Tn, Gj ==> true
i ~= j
but are less easy to check. The example clearly violates condition 4.
The difficulty of checking these conditions (a decision procedure for
arithmetic appears to be required) would at first sight appear to be a
problem, but in fact there are good reasons not to impose them,

Now let's look at an example.

----------------------------------------------------------------
Queue package example.

We'll implement queues twice.

-module(queue).
-export([#empty/0,#nonempty/1,push/2,pop/1]).

#empty() -> [].

#nonempty() -> [_|_]. % violates condition 2

push(Q, X) -> append(Q, [X]).

pop([X|Q]) -> {X,Q}.

%end.


-module(queue).
-export([#empty/0,#nonempty/1,push/2,pop/2]).

#empty() -> {[],[]}.

#nonempty() -> {[_|_],_}; % violates condition 2
#nonempty() -> {[],[_|_]}. % violates condition 2

push({L,R}, X) -> {L,[X|R]}.

pop({[X|L],R}) -> {X,{L,R}};
pop({[],R}) -> [X|L] = reverse(R), {X,L}.

%end.


-module(bfs).
-export(bfs/1).
-import(problem, [solved/1,children/1]).
-import(queue, [#empty/0,#nonempty/1,push/2,pop/1]).

bfs(Start) ->
loop(push(#empty(), Start)).

loop(#empty()) -> undefined;
loop(Q = #nonempty()) ->
{X,P} = top(Q),
case solved(X) of
true -> X;
false -> loop(pushall(P, children(X)))
end.

pushall(Q, []) -> Q;
pushall(Q, [H|T]) -> pushall(push(Q,H), T).

%end.


Here's the translation of the bfs module:

-module(bfs).
-export(bfs/1).

bfs(Start) ->
loop(queue:push(queue:'#empty'(), Start)).

loop(Q) when {} = queue:'0=empty'(Q) ->
undefined;
loop(Q) when {} = queue:'0=nonempty'(Q) ->
{X,P} = queue:pop(Q),
case problem:solved(X) of
true -> X;
false -> loop(pushall(P, problem:children(X)))
end.

pushall(Q, []) -> Q;
pushall(Q, [H|T]) -> pushall(queue:push(Q,H), T).

%end.

I have argued elsewhere that there should be two directives

-use_early(module, [used functors]).
-use_late(module, [used functors]).

which indicate that specified functors from a specified module are used,
the former indicating early binding, neither of them indicating that a
name is imported into the local scope. One reason for that was to
capture the good aspect of the preprocessor, which is precisely early
binding. But note that if we used the preprocessor to hide the
implementation of queues from the programmer while revealing it to the
compiler, we would have to use early binding, while with abstract
patterns, that's an independent choice. The bfs module can use #empty()
and #nonempty() as patterns without requiring early binding

Note that replacing one version of the queue module by another while
bfs/1 was running would not be a good idea. This seems to be quite
typical, and I do not understand how Erlang avoids serious trouble. The
problem is that once the bfs/1 function has received a data structure
from the queue module, it depends on the queue module still accepting
the same data structure, so the process running bfs/1 depends on the
queue module even when it is not calling a function from that module.
There really needs to be some sort of locking facility so that bfs/1
could be written

bfs(start) ->
with_module_lock(queue,
loop(push(#empty(), Start))).

However, if we are aware that there are long-lived processes that have
data structures of the old form that need to be supported for a while,
we might choose to write a queue module that supports both forms, while
preferring the new one, and it is for this reason that abstract patterns
may have more than one clause. (Although the second version of the
queue modules needs this anyway.) Here goes:

-module(queue).
-export([#empty/0,#nonempty/1,push/2,pop/1]).

#empty() -> {[],[]};
#empty() -> []. % violates condition 5

#nonempty() -> {[_|_],_};
#nonempty() -> {[],[_|_]};
#nonempty() -> [_|_]. % violates condition 5

push({L,R}, X) -> {L,[X|R]};
push(Q, X) -> append(Q, [X]).

pop({[X|L],R}) -> {X,{L,R}};
pop({[],R}) -> [X|L] = reverse(R), {X,L};
pop([X|Q]) -> {X,Q}.

%end.

Try doing that with macros!

Of course, macros provide early binding and inlining in one easy step,
which may (but need not) be good for efficiency. If an abstract pattern
is defined in the module that uses it, or if -use_early is defined, then
early binding occurs, and inlining is a possibility. The option of
inlining may be better than the certainty of inlining. Why?

The first, and perhaps most obvious, reason is that abstract patterns
are named code forms with well defined invocation and contents, just
like functions. The fact that they can be used without requiring
inlining is good news for debuggers and interpreters, not to mention
test coverage analysers and profilers.

The second is that inlining doesn't always help efficiency. There's no
point in executing 10% fewer instructions if, thanks to cache effects,
it takes 20% longer to do so. Back in the late 80s at Quintus, a
certain competitor claimed that their native-code Prolog system
generated code that ran twice as fast as ours. So it did, on naive
reverse. On a two-page program (which didn't fit into the 68020's
on-chip instruction cache), it did no better than our threaded-code
system. On a real program, it was twice as slow. I have heard that the
same phenomenon occurred with BEAM -vs- JAM. Inlining can have similar
effects, though less severe.

One extension to Erlang would make it easier to inline abstract
patterns, and that is disjunctive guards. Currently, we have

guard ::= ... '(' guard ')'

replace that by

guard ::= ... '(' guard {'|' guard}... ')'

with the obvious semantics. This would be useful anyway.

----------------------------------------------------------------
What's wrong with records?

Records tangle up several things:
1 the possibility of naming a pattern
2 field and update notation.
3 keyword and default arguments
4 a very specific data structure
5 a global name space
6 the possibility of unchecked conflicting definitions in that namespace
7 a requirement to use the preprocessor to get weak abstraction
8 syntax that includes what LOOK like pattern matches but aren't.

Some of these are good (1, 2, 3), some of them are less good (4), some
of them are bad (5 and 8) and some of them are very bad (6 and 7).
Pattern abstraction provides (1 and 2) without any of the others.

What's wrong with (4)? First, it means that records actually give us no
abstraction at all; in the Erlang Standard specification, to be a record
of type T is precisely to be a tuple of a particular arity whose first
element is the atom T. Second, if we want to give a name to some other
structure, we are sunk, as records buy us nothing. Here is a realistic
example. Suppose we want to manipulate timestamps. For compatibility
with current Erlang, we might want these timestamps to contain embedded
date and time values. So

#timestamp() -> {_, _}.
#timestamp(DT,TM) -> {DT, TM}.
#timestamp(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.

#time_date(DT) -> {DT,_}.
#time_date(Y,M,D) -> {{Y,M,D},_}.

#time_time(TM) -> {_,TM}.
#time_time(H,I,S) -> {_,{H,I,S}}.

#time_year(Y) -> {{Y,_,_},_}.
#time_month(M) -> {{_,M,_},_}.
#time_day(D) -> {{_,_,D},_}.
#time_hour(H) -> {_,{H,_,_}}.
#time_minute(I) -> {_,{_,I,_}}.
#time_second(S) -> {_,{_,_,S}}.

Eventually, we purge our code of explicit references to the patterns,
and can switch to a more space-efficient representation:

#timestamp() -> {_,_,_,_,_,_}.
#timestamp({Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
#timestamp(Y,M,D,H,I,S) -> {Y,M,D,H,I,S}.

#time_date({Y,M,D}) -> {Y,M,D,_,_,_}.
#time_date(Y, M, D) -> {Y,M,D,_,_,_}.

#time_time({H,I,S}) -> {_,_,_,H,I,S}.
#time_time(H, I, S) -> {_,_,_,H,I,S}.

#time_year(Y) -> {Y,_,_,_,_,_}.
#time_month(M) -> {_,M,_,_,_,_}.
#time_day(D) -> {_,_,D,_,_,_}.
#time_hour(H) -> {_,_,_,H,_,_}.
#time_minute(I) -> {_,_,_,_,I,_}.
#time_second(S) -> {_,_,_,_,_,S}.

What do we have here? A change of representation. What is it that
records do not allow you to vary? The representation (except within
very narrow limits).

Just to resume a topic that may not have been clear before, here are the
functions we get for the first of these definition sets.

'#timestamp'() -> error().
'0=timestamp'({_,_}) -> {}.
'!timestamp'({X1,X2}) -> {X1,X2}.

'#timestamp'(DT, TM) -> {DT, TM}.
'2=timestamp'({DT,TM}) -> {DT, TM}.
'!timestamp'({X1,X2}, DT, TM) -> {DT,TM}.

'#timestamp'(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.
'6=timestamp'({{Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
'!timestamp'({{X1,X2,X3},{X4,X5,X6}},Y,D,M,H,I,S) -> {{Y,M,D},{H,I,S}}.

'#time_date'(DT) -> error().
'1=time_date'({DT,_}) -> DT.
'!time_date'({X2,X1}, DT) -> {DT,X1}.

'#time_date'(Y,M,D) -> {{Y,M,D},_}.
'3=time_date'({{Y,M,D},_}) -> {Y,M,D}.
'!time_date'({X2,X1}, Y, M, D) -> {{Y,M,D},X1}.

'#time_time'(TM) -> error().
'1=time_time'({_,TM}) -> TM.
'!time_time'({X1,X2}, TM) -> {X1,TM}.

#time_time(H,I,S) -> error().
'3=time_time'({_,{H,I,S}}) -> {H,I,S}.
'!time_time'({X1,X2}, H, I, S) -> {X1,{H,I,S}}.

'#time_year'(Y) -> error().
'1=time_year'({{Y,_,_},_}) -> Y.
'!time_year'({X4,X1,X2},X3}) -> {{Y,X1,X2},X3}.

'#time_month'(M) -> error().
'1=time_month'({{_,M,_},_}) -> M.
'!time_month'({X1,X4,X2},X3}, M) -> {{X1,M,X2},X3}.

'#time_day'(D) -> error().
'1=time_day'({{_,_,D},_}) -> D.
'!time_day'({X1,X2,X4},X3}, D) -> {{X1,X2,D},X3}.

'#time_hour'(H) -> error().
'1=time_hour'({_,{H,_,_}}) -> H.
'!time_hour'({X1,{X4,X2,X3}}, H) -> {X1,{H,X2,X3}}.

'#time_minute'(I) -> error().
'1=time_minute'({_,{_,I,_}}) -> I.
'!time_minute'({X1,{X2,X4,X3}}, I) -> {X1,{X2,I,X3}}.

'#time_second'(S) -> error().
'1=time_second'({_,{_,_,S}}) -> S.
'!time_second'({X1,{X2,X3,X4}}, S) -> {X1,{X2,X3,S}}.

So if, for example, we wanted to refer to noon today, or to find what
year it is, we could write

Noon_Today = timestamp:now()#time_time(12,0,0),
This_Year = timestamp:now()#time_year

and these would work whether a timestamp was stored as one tuple or
three. The -record approach cannot be used with the three tuple
version. (It can't be used with the one tuple version either, because
there must be an atom serving as a tag, and it must be the first element
of the tuple, and it must be the same as the record name.)

What about keyword and default arguments (3)? Are they not a benefit of
using records? Yes, they are. But they should be available for all
functions and abstract patterns, not just records. Perhaps it is unfair
to say ``ask an Ada programmer'', because Ada doesn't use data the way
Erlang does. But Ada programs do have lots of procedure and function
calls, just as Erlang does, and Erlang functions need more arguments
than Ada functions, for the same reasons as Prolog, so need keyword and
default arguments more than Ada does. I intend that abstract patterns
should use whatever method is used to provide that feature for general
functions.

Do records really imply a global namespace? Yes, because the record
name appears inside the record. Consider:

-module(a).
-record(r, {X,Y,Z}).
...
f(..R..) when record(R, r) ->

-module(b).
-record(r, {X,Y}).
g(R) when record(R, r) -> f(R).
... g(#r{1,2}) ...

It is not clear exactly what the record/2 guard test should do. One
manual says "This test checks that R is a tuple of arity N+1, where N is
the number of fields in the record, and the first element in the tuple
is the" atom naming the record. This means that g/1 WILL believe that
record(R, r) is true, but f/1 will NOT. That does mean that f/1 will not
be tricked into working with something that doesn't match its idea of
what an `r' record looks like, but programmers will find it very
confusing that record(R,r) can succeed in one module and fail in another.
It is also a nuisance that we cannot tell whether something "is a record
of type r" without having a -record declaration in scope. That doesn't
apply to any other guard test.

The real problem comes when a and b agree on the arity of r but disagree
on the argument order or interpretation. Consider again,

-module(a).
-record(r, {X,Y,Z}).
...
f(...#r(X,Y,Z)...) -> ...

-module(b).
-record(r, {Z,Y,X}).
...
f(...#r(Z,Y,X)...)

Either a global namespace for records is enforced (so that the record/2
guard test works consistently), or serious confusion may result. The
snag here is that a record name ought to be `owned' by some module, but
in fact it is `owned' by a preprocessor file.

How do pattern abstractions address this problem? In part, it must be
admitted, they don't. They are weak abstractions; there is no defence
against forgery, and little defence against accident other than adding a
tag, just like records. The advantage comes from the fact that an
abstract pattern is clearly owned by a specific module; there is one
declaration, not multiple copies of a declaration. Abstract pattern
names are module qualified, just like other function names.

There is something that could be done that would make accident unlikely
and forgery slightly harder. Records differ from tuples in that they
are tagged. The tags cannot be module-specific, because the same
declaration may be copied into many modules, with no clue as to which is
the owner. All we do is take the idea of tagging, and bend the existing
module-specific literal syntax (like ?VERSION and ?MODULE) a little bit.
The directive

-unique(name).

would mean that at load time a new ref() would be created and bound to
name, in such a way that the name can be recovered from the ref() for
debugging purposes, and at compile time each occurrence of ?name would
refer to that ref(). (This is like the classic Lisp hack of using a
unique cons cell or function closure as a tag.) Then we could write

-unique(ts).

#timestamp() -> {?ts,_, _}.
#timestamp(DT,TM) -> {?ts,DT, TM}.
#timestamp(Y,M,D,H,I,S) -> {?ts,{Y,M,D},{H,I,S}}.

#time_date(DT) -> {?ts,DT,_}.
#time_date(Y,M,D) -> {?ts,{Y,M,D},_}.

#time_time(TM) -> {?ts,_,TM}.
#time_time(H,I,S) -> {?ts,_,{H,I,S}}.

#time_year(Y) -> {?ts,{Y,_,_},_}.
#time_month(M) -> {?ts,{_,M,_},_}.
#time_day(D) -> {?ts,{_,_,D},_}.
#time_hour(H) -> {?ts,_,{H,_,_}}.
#time_minute(I) -> {?ts,_,{_,I,_}}.
#time_second(S) -> {?ts,_,{_,_,S}}.

This is admittedly clumsy, but it's the idea I want to push, not the
syntax, which is at least local to the owning module. This looks a bit
like the preprocessor, but it only requires recognising ?<id> as a
special form. If we left the question mark off (and the binding to a
ref()) we would have the same insecure approach as records, without the
requirement of a fixed location for the tag.

Thanks to preprocessor use, that insecurity is real. The really nasty
thing happens when module a gets its record declaration from foo.hrl,
module a is compiled and loaded, foo.hrl is changed, and another module
including (the revised version of) foo.hrl is compiled and loaded. We
have a dependency between module a and foo.hrl which is not noticed;
partly because foo.hrl is not a module and is not, as such, loaded, and
partly because the dependency exists even while the code in foo.hrl is
not running.

So we are left with only one thing that abstract patterns cannot do as
well as records or better, than that is keyword and default parameters.
At one time it was expected that records would become new types; this
seems to have been dropped from the current Erlang Standard
specification (0.6); I surmise because of the difficulty of establishing
an `owner'.

There is one more thing that abstract patterns can do that records as
currently defined in Erlang cannot do, and that is enforce type
constraints on components. Consider triangles. We'd like a triangle to
be represented by a `record' containing three floats. Using record
notation, all we can do is write

-record(triangle{A,B,C}).

and hope. Using abstract pattern notation, we can write

#triangle(A,B,C) when float(A), float(B), float(C) -> {A,t,B,C}.

and this will enforce the type constraints on the components. Here is a
sample use of this pattern, to see whether a value of this type is a
plausible triangle.

plausible(#triangle(A,B,C))
when 0 < A, A <= B+C,
0 < B, B <= C+A,
0 < C, C <= A+B
-> true;
plausible(_) -> false.

After translation and inline expansion, plausible/1 becomes

plausible({A,t,B,C}) % check form
when float(A), float(B), float(C), % check types
0 < A, A <= B+C,
0 < B, B <= C+A,
0 < C, C <= A+B
-> true;
plausible(_) -> false.

This ability to propagate type information/checking from the declaration
to the uses, if the programmer wants it, is an important safety feature
that is completely missing from Erlang records, and is obtained without
any new runtime machinery. Note that an Erlang compiler can easily
deduce from the use of A, B, C that they must be some kind of number,
but not which kind, not without interprocedural data flow analysis.
Even without inlining, we get the runtime type checking we choose in the
declaration, which has to be helpful if we want to build large systems.

The End.

The abstract pattern notation neatly replaces both the record
declarations and the use of the preprocessor.

Records let you say

L2 = L#log{blocked_by = none, status = ok}

which is arguably prettier than

L2 = L#log_blocked_by(none)#log_status(ok)

but not hugely so. One could argue that the abstract pattern
version has the merit of not containing a manifestly false
pattern match 'status = ok'.

Records automatically provide you with a full set of record.field
forms; abstract patterns for them have to be separately declared.
On the other hand, records *force* you to accept a full set of
record.field forms even if you want some of them blocked, and it
is not at all clear how import/export of #record.field forms would
be controlled.

Records permit patterns like

#log{status == S, users = N} = L

which is arguably prettier than

#log_status(S) = #log_users(N) = L

but again, the latter never contains manifestly absurd matches.

Dynamic Patterns.

The parallel between abstract patterns and functions can be stretched a
little further. In effect, we can regard a value of type
(T1,...,Tn) => T0
[where => is the pattern arrow in distinction to the function arrow ->] as
a record
( b : (T1,...,Tn) -> T0 % building
u : (T0,T1,...,Tn) -> T0 % updating
d : T0 -> (T1,...,Tn) % decomposing
)
and
X = #p(E1,...,En) ==> X = p.b(E1,...,En)
X = E0#p(E1,...,En) ==> X = p.u(E0,E1,...,En)
#p(P1,...,Pn) = E ==> {P1,...,Pn} = p.d(E)
Since a triple of functions can be passed as a value, why not allow a
pattern to be passed as a value? Let's put this in Erlang terms.

Define
#E0(E1,...,En)
where E0 is not an atom or a module:atom pair to be
#apply(E0, [E1,...,En])
Define
#apply(Mod,E0,EL)
to be
#apply({Mod,E0}, EL)
Now we have to define the use of #apply/2.

X = #apply(MF,L)
where MF = {M,F}, L = [T1,...,Tn}, and M, F are atoms
reduces to X = #M:F(T1,...,Tn), i.e. X = M:'#F'(T1,...,Tn).
where MF is the result of (fun #F/N) executed statically within
module M, L = [T1,...,Tn], and N = n,
reduces to X = #M:F(T1,...,Tn)

X = E0#apply(MF,L)
where MF = {M,F}, L = {T1,...,Tn} and M, F are atoms
reduces to X = E0#M:F(T1,...,Tn) i.e. X = M:'!F'(E0,T1,...,Tn)
where MF is the result of (fun #F/N) executed statically within
module M, L = [T1,...,Tn], and N = n,
reduces to X = Eo#M:F(T1,...,Tn).

#apply(MF, [P1,...,Pn]) = E
where MF is statically a GuardExpression whose value is dynamically
{M,F} for M and F atoms, and [P1,...,Pn] is statically a list of Patterns,
reduces to #M:F(P1,...,Pn) = E, i.e. {P1,..,Pn} = M:'n=F'(E).
where MF is statically a GuardExpression whose value is dynamically
the result of (fun #F/N) executed statically within a module M,
[P1,...,Pn] is statically a list of Patterns, and N = n,
reduces to #M:F(P1,...,Pn) = E.

Now we can do things like this:

proj(P, []) -> [];
proj(P, [#P(X)|Xs[) -> [X|proj(P,Xs)].

#unit_list(X) -> [X].
#unit_tuple(X) -> {X}.

... proj(fun #unit_list/1, L), ...
... proj(fun #unit_tuple/1, LT), ...

By bending explicit fun syntax a little bit, we can even get anonymous
patterns:
fun #( Pattern... ) when Guard -> Pattern
; #( Pattern... ) when Guard -> Pattern ...
end
e.g.
fun #(X) -> [X|_] end
is an anonymous `hd' pattern. (The syntax here is based on two rules:
to get pattern syntax from function syntax, put # in front of the
[Module:]Name, and to get explicit fun syntax, put `fun' `end' around
the clauses and delete the function Name.) It's not that I think
anonymous patterns are wildly useful; the point is that if they weren't
at least thinkable the parallels between patterns and functions would be
sploit, to the detriment of our ability to reason correctly about them.

Try doing that with records!

As a matter of fact, dynamic patterns do something for us that isn't
easy to get any other way. Look at prof/2 again. Is there any
nonsyntactic difference between

proj(P, []) -> [];
proj(P, [#P(X)|Xs]) -> [X|proj(P, Xs)].

and

map(F, []) -> [];
map(F, [X|Xs]) -> [F(X)|map(F, Xs)].

or is it just a piece of `cuteness' that lets you put the function call
on the `wrong' side?

Yes, there is a difference. The function F in map/2 may do anything at
all; it may have side effects, it may allocate arbitrarily large amounts
of storage, it may run for huge amounts of time. The pattern P in
proj/2 must be an abstraction of a pattern-and-guard: it may not send
or receive messages, have any other side effects, allocate huge amounts
of storage, or take unbounded amounts of time. It is forced to be well
behaved in ways that an arbitary function parameter is not. Indeed, we
are sure that the value X it returns in proj/2 is part of the term it
matches, so it really is a projection function. If we want to reason
about the effects of a code fragment, proj/2 is significantly easier to
reason about than map/2. This means that anonymous patterns do have a
use:

proj(fun #(X) -> {X,_,_} end, L)

is trivially well-behaved, whereas

map(fun ({X,_,_}) -> X end, L)

requires more analysis to see its harmlessness. (Of course, checking
higher order functions is never entirely trivial, and it is something of
a pity that they were added to Erlang so hastily. Topic for another
working paper.)

When you've invented a language construct, a good sign that you are onto
something worthwhile is that applications come faster than you have time
to write them down. When I wrote the stuff about dynamic abstract
patterns, I didn't have an application in mind. But I suddently realised
that there was a proposal I thought of last year which had been languishing
because there was no safe way to do it.

Erlang processes are vulnerable to a denial-of-service attack where a
malicious or erroneous process repeatedly sends "junk mail". The Erlang
documentation recommends that most or all 'receive' commands should
include a wildcard case to discard junk mail, but that has the disadvantage
that it won't discard junk mail until the process actually executes a
receive. Junk mail can pile up while a process is suspended. If nothing
else, it causes the recipient of the junk mail to pay (in processing and
garbage collection time) to inspect it.

Last year I came up with the idea of a 'message filter'.

set_process_filter(fun (Msg) -> ... true | false end)

would mean that whenever a (local) process send a message to this one,
the message filter function would run IN THE SENDER'S THREAD to see if
the message was ok. If it returned 'false', the message would be
discarded, otherwise it would be retained.

The problem with that is that it makes the sender vulnerable to Trojan
horses; the message filter function could do absolutely anything while
masquerading as the sender. What I needed was a way of declaring and
enforcing that the message filter function would execute in bounded time
with no side effects.

Does that sound familiar? It's basically an abstract pattern.

So here's the new proposal.

set_process_filter(fun #() when Guard -> Pattern end)

If the argument is a pattern expression of arity 0, then
when a message is sent to this process, the pattern will
be executed in the sender's thread and if it matches the
message will be sent and if it fails the message will be
discarded.

set_process_filter(fun #(Result) when Guard -> Pattern end)

If the argument is a pattern expression of arity 1, then
when a message is sent to this process, the pattern will
be executed in the sender's thread, and if it matches,
the Result will be sent, and if it fails, the message
will be discarded.

Of course these forms are only illustrative; there could be any number
of clauses and fun #F/0 or fun #F/1 could be used instead. The use of
set_process_filter/1 would be to allow simple protocol conversions.

Whether this feature is as useful as I think is debatable; Tobbe
argues that the best idea is to conceal processes inside their creating
modules, and I think he's probably right about that. The point is that
dynamic functions whose cost is certainly bounded are a useful thing to
have, and they fall out quite naturally from trying to take abstract
patterns seriously.
unknown
2003-09-23 12:52:58 UTC
Permalink
Yes, this is exactly what I wish for. :-)

Erlangers, is there any possibility that this will be added?

Cheers
/Fredrik
-----Original Message-----
From: Sven-Olof Nystr|m [mailto:svenolof]
Sent: den 23 september 2003 11:18
To: Sven-Olof Nystr|m
Cc: Sean Hinde; Richard A. O'Keefe; erlang-questions;
fredrik.linder
Subject: Re: Use-defined guard (was: RE: Enhanced type guard syntax])
Post by unknown
Post by unknown
I can't seem to find this proposal online. Link anyone?
I have a copy. I could post it on the list, if Richard does not have
any objections.
See below.
Sven-Olof
Pattern Abstraction
Richard O'Keefe
There is one serious defect which Prolog has never overcome, although
successor languages such as Mercury have addressed it. It is a defect
which languages such as SML, Miranda, Haskell, and Clean do not have,
although it is shared by Scheme and Lisp generally.
The defect is that Prolog has no support for abstract data types. In
fact, Prolog pretty well goes out of its way to discourage abstract data
types. Everything in Prolog is a term, and a term is either a variable
or a functor applied to arguments which are themselves terms. Not only
are there primitive operations (var/1, nonvar/1, functor/3, arg/3) which
can be used to compose and decompose any data structure whatever, so
that abstraction cannot be enforced, Prolog's normal mode of dealing
with data is pattern matching, which means that abstraction is
penalised.
The classic example is list processing. One can quite easily set up an
abstract data type list(T) with operations
null([]).
pair([H|T], H, T).
and then write
append(X, Y, Y) :- null(X).
append(X, Y, Z) :- pair(X, H,U), pair(Z, H,V), append(U, Y, V).
instead of
append([], Y, Y).
append([H|X1], Y, [H|Z1]) :- append(X1, Y, Z1).
but then one has not merely the overhead of calling extra predicates,
but the rather disastrous loss of first-argument indexing. All Prolog
really needs to solve this problem is a form of predicate definition
that is guaranteed to be expanded inline, but this is not one of the
matters that the ISO committee chose to address (not the least of the
reasons why I have harsh words to say about that committee and that
standard). Another thing that would restore indexing would be to use
negative clauses such as
<- null(L), pair(L, _, _).
to declare that null/1 and pair/3 are mutually exclusive. SB-Prolog
has for a long time used such knowledge about built-in predicates
(notably arithmetic comparison and term comparison) to allow a programmer
to write e.g.
max(X, Y, X) :- X > Y.
max(X, Y, Y) :- X =< Y.
and get mutual exclusion of clauses without cuts. However, as Erlang
has the typical functional ``commit to first successful match'' rule,
it doesn't need negative clauses to get exclusion.
Erlang has Prolog-like problems with abstract data types. First, all
Erlang data structures are terms, built from variables, constants,
pairs, and tuples, and there are primitive features of the language for
composing and decomposing terms, so it is quite impossible to enforce
encapsulation, at least, not without imposing a type system. Second,
Erlang too normally operates on data structures by means of patterns,
which literally make the structure of data visible. Unlike Prolog,
Erlang does not rely on first-argument (or any other) indexing, but it
does not (and should not) allow general user-defined functions in
guards, so something like
append(X, Y) when empty(X) -> Y;
append(X, Y) when {H,T} = pair(X) -> make_pair(H, append(T,Y)).
would not be allowed. There are good reasons for this, and I do not
want to overturn this feature of Erlang. But we do need something to
put in its place if we are to support even weak data abstraction, and we
need data abstraction very much if we are going to have 10MSLOC programs
that we can hope to trust. (I note that a large chunk of New Zealand
was without emergency telephone service for several days in the week
just ending, due to a software problem. Probably written in C. (:-))
Note that the disallowed append/2 code I showed above has a rather
unpleasant property, which the inlined-simple-predicates approach in
Prolog is thankfully free of: composing a pair and decomposing a pair
use different forms. This increases the burden on the programmer.
There is no symmetry at all that can be exploited to make the
correspondence easier to remember.
There is another major property which a good solution to the problem
should have: it should be fully compatible with an entire absence of
any preprocessor. At the end of this note I shall explain what is badly
wrong with records in Erlang.
The pattern abstraction proposal is quite simple. There are three ways
- to compose a term
- to decompose a term
- to update a term (that is, compose a new term similar to an old one with
some differences.
A pattern definition is basically a restricted kind of function
definition, which is rewritten to define a composition function, a
decomposition function, and an update function. Each of the three ways
of using an abstract pattern is defined by rewriting to a call to one of
the three derived functions.
Abstract patterns begin with a sharp, like records. Unlike records,
they use round parentheses instead of curly braces. Also unlike
records, they are executable code, not a kind of macro, so they may come
from modules other than the lexically enclosing one. A pattern
definition is a restricted function definition whose heads are decorated
with a sharp and whose bodies are patterns (possibly involving other
abstract patterns).
definition +:= patclause {';' patclause}... '.'
patclause ::= '#' head ['when' guard] '->' pattern
pattern +:= '#' [module ':'] name '(' [patterns.] ')'
primary +:= '#' [module ':'] name '(' [expressions] ')'
| primary '#' [module ':'] name '(' [expressions] ')'
Here's append/2 using abstract patterns.
#empty() -> [].
#pair(H,T) -> [H|T].
#head(H) -> [H|_].
#tail(T) -> [_|T].
append1(#empty(), Y) -> Y;
append1(#pair(H,T), Y) -> #pair(H,append1(T,Y)).
append2(#empty(), Y) -> Y;
append2(#head(H)=#tail(T), Y) -> #pair(H,append2(T,Y)).
What does this mean? A pattern definition basically defines three
functions: a function to use for composition, a function to use for
decomposition, and a function to use for update. For the sake of
exposition, let's name the composition function for #x/n '#x'/n, the
decomposition function 'n=x'/1, and the update function '!x'/n+1.
'#empty'() -> []. % composition function
'0=empty'([]) -> {}. % decomposition function
'!empty'([]) -> []. % update function
'#pair'(H,T) -> [H|T].
'2=pair'([H|T]) -> {H,T}.
'!pair'([_|_], H,T) -> [H|T].
'#head'(H) -> error(). % useless composition function
'1=head'([H|_]) -> H.
'!head([_|T], H) -> [H|T].
'#tail'(T) -> error(). % useless composition function
'1=tail'([_|T]) -> T.
'!tail'([H|_], T) -> [H|T].
append1(X, Y) when {} = '0=empty'(X) -> Y;
append1(X, Y) when {H,T} = '2=pair'(X) ->
'#pair'(H, append1(T,Y)).
append2(X, Y) when {} = '0=empty'(X) -> Y;
append2(X, Y) when H = '1=head'(X), T = '1=tail'(Y) ->
'#pair'(H, append2(T,Y)).
After inline expansion, these two definitions would be identical to each
other and to the usual definition.
Suppose we are given a pattern definition
#p(P11,...,P1n) when G1 -> Q1;
...
#p(Pk1,...,Pkn) when Gk -> Qk.
The composition function is obtained very simply by turning #p into '#p'.
'n=p'(Q1) when G1 -> {P11,...,P1n};
...
'n=p'(Qk) when Gk -> {Pk1,...,Pkn}.
However, for the sake of field selection notation, and for efficiency,
'1=p'(Q1) when G1 -> P11;
...
'1=p'(Qk) when Gk -> Pk1.
The update function has to copy the bits that aren't mentioned and
replace the bits that are. Define Q'i to be Qi with anonymous variables
replaced by new variables, Q"i and G"i to be Q'i and Gi with the
original named variables consistently replaced by new variables. Then
we get
'!p'(Q"1,P11,...,P1n) when G"1 -> Q'1;
...
'!p'(Q"k,Pk1,...,Pkn) when G"k -> Q'k.
1. every variable used in a guard must be defined in both the
argument patterns and the replacement pattern corresponding.
2a. If the composition function is to be usable, Qi must be fully
determined by Pi1...Pin and Gi, that is, have no variable,
anonymous or otherwise, that is not assigned a value by them.
2b. On the other hand, if an update function is to be useful, there
should be some anonymous variables in Qi, or else why not
just use a composition function?
3. Pij must be fully determined by Qi and Gi, that is, have no variable,
anonymous or otherwise, that is not assigned a value by them.
This actually turns out to be too strict. There is no point in having
an abstract pattern that can only be used for composition; we already
have perfectly good functions for that. But there is some sense in
having an abstract pattern that can only be used for decomposition.
Accordingly, if condition 2a is violated by the ith clause, that clause
of the composition function becomes
'#p'(Pi1,...,Pin) when Gi -> error(...)
When an abstract pattern is invoked as part of a pattern,
#[m:]p(P1,...,Pn) = E
it is as if
{P1,...,Pn} = [m:]'n=p'(E)
were invoked instead, if n~=1, or as if
P1 = [m:]'1=p'(E)
were invoked, if n=1. Since abstract patterns are state independent and
have no side effects, it doesn't matter whether the translation
preserves left to right order or not; suffice it that pattern match
translation can preserve left to right order if we want. If #p/1 is an
E0#p
is as if
'1=p'(E0)
had appeared instead. This applies to guard expressions as well as
general expressions. Note that E#p is different from #p(E).
When an abstract pattern is invoked as part of an expression,
#[m:]p(E1,...,En)
it is as if
[m:]'#p'(E1,...,En)
were invoked instead. This clearly preserves the order and embedding of
expressions, so left to right execution order is not at all disturbed.
When an abstract pattern is invoked for update,
V = E0#[m:]p(E1,...,En)
it is as if
V = [m:]'!p'(E0, E1,...,En)
were invoked instead. The only way this could perturb left to right
ordering is if the module prefix involved an expression other than a
variable or atom, and in that case the translation
(ET = E0, V = m:'!p'(ET, E1, ..., En)
can be used (assuming that Erlang does or will allow arbitrary primaries
as module prefixes) where ET is a new variable. The preservation of
left to right order is why the `base datum' argument of an update
function is the first rather than the last.
There is one further condition. At the moment, if P is any pattern, and
P = P makes sense, (mainly, P contains no unbound variables), we expect
it to be true. But consider
#p(1) -> [];
#p(2) -> [].
Y = #p(2), #p(2) = Y
will fail, because it turns into
'#p'(1) -> []; '#p'(2) -> [].
'1=p'([]) -> 1; '1=p'([]) -> 2.
'!p'([],1) -> []; '!p'([],2) -> [].
Y = '#p'(2), 2 = '1=p'(Y)
The conditions we need are easy to state
4. There does not exist any term T such that
Qi = T, Gi ==> true
Qj = T, Gj ==> true
i ~= j.
5. There do not exist any terms T1,...,Tn such that
Pi1 = T1, ..., Pin = Tn, Gi ==> true
Pj1 = T1, ..., Pjn = Tn, Gj ==> true
i ~= j
but are less easy to check. The example clearly violates condition 4.
The difficulty of checking these conditions (a decision procedure for
arithmetic appears to be required) would at first sight appear to be a
problem, but in fact there are good reasons not to impose them,
Now let's look at an example.
----------------------------------------------------------------
Queue package example.
We'll implement queues twice.
-module(queue).
-export([#empty/0,#nonempty/1,push/2,pop/1]).
#empty() -> [].
#nonempty() -> [_|_]. % violates condition 2
push(Q, X) -> append(Q, [X]).
pop([X|Q]) -> {X,Q}.
%end.
-module(queue).
-export([#empty/0,#nonempty/1,push/2,pop/2]).
#empty() -> {[],[]}.
#nonempty() -> {[_|_],_}; % violates condition 2
#nonempty() -> {[],[_|_]}. % violates condition 2
push({L,R}, X) -> {L,[X|R]}.
pop({[X|L],R}) -> {X,{L,R}};
pop({[],R}) -> [X|L] = reverse(R), {X,L}.
%end.
-module(bfs).
-export(bfs/1).
-import(problem, [solved/1,children/1]).
-import(queue, [#empty/0,#nonempty/1,push/2,pop/1]).
bfs(Start) ->
loop(push(#empty(), Start)).
loop(#empty()) -> undefined;
loop(Q = #nonempty()) ->
{X,P} = top(Q),
case solved(X) of
true -> X;
false -> loop(pushall(P, children(X)))
end.
pushall(Q, []) -> Q;
pushall(Q, [H|T]) -> pushall(push(Q,H), T).
%end.
-module(bfs).
-export(bfs/1).
bfs(Start) ->
loop(queue:push(queue:'#empty'(), Start)).
loop(Q) when {} = queue:'0=empty'(Q) ->
undefined;
loop(Q) when {} = queue:'0=nonempty'(Q) ->
{X,P} = queue:pop(Q),
case problem:solved(X) of
true -> X;
false -> loop(pushall(P, problem:children(X)))
end.
pushall(Q, []) -> Q;
pushall(Q, [H|T]) -> pushall(queue:push(Q,H), T).
%end.
I have argued elsewhere that there should be two directives
-use_early(module, [used functors]).
-use_late(module, [used functors]).
which indicate that specified functors from a specified module are used,
the former indicating early binding, neither of them indicating that a
name is imported into the local scope. One reason for that was to
capture the good aspect of the preprocessor, which is precisely early
binding. But note that if we used the preprocessor to hide the
implementation of queues from the programmer while revealing it to the
compiler, we would have to use early binding, while with abstract
patterns, that's an independent choice. The bfs module can use #empty()
and #nonempty() as patterns without requiring early binding
Note that replacing one version of the queue module by another while
bfs/1 was running would not be a good idea. This seems to be quite
typical, and I do not understand how Erlang avoids serious trouble. The
problem is that once the bfs/1 function has received a data structure
from the queue module, it depends on the queue module still accepting
the same data structure, so the process running bfs/1 depends on the
queue module even when it is not calling a function from that module.
There really needs to be some sort of locking facility so that bfs/1
could be written
bfs(start) ->
with_module_lock(queue,
loop(push(#empty(), Start))).
However, if we are aware that there are long-lived processes that have
data structures of the old form that need to be supported for a while,
we might choose to write a queue module that supports both forms, while
preferring the new one, and it is for this reason that abstract patterns
may have more than one clause. (Although the second version of the
-module(queue).
-export([#empty/0,#nonempty/1,push/2,pop/1]).
#empty() -> {[],[]};
#empty() -> []. % violates condition 5
#nonempty() -> {[_|_],_};
#nonempty() -> {[],[_|_]};
#nonempty() -> [_|_]. % violates condition 5
push({L,R}, X) -> {L,[X|R]};
push(Q, X) -> append(Q, [X]).
pop({[X|L],R}) -> {X,{L,R}};
pop({[],R}) -> [X|L] = reverse(R), {X,L};
pop([X|Q]) -> {X,Q}.
%end.
Try doing that with macros!
Of course, macros provide early binding and inlining in one easy step,
which may (but need not) be good for efficiency. If an abstract pattern
is defined in the module that uses it, or if -use_early is defined, then
early binding occurs, and inlining is a possibility. The option of
inlining may be better than the certainty of inlining. Why?
The first, and perhaps most obvious, reason is that abstract patterns
are named code forms with well defined invocation and contents, just
like functions. The fact that they can be used without requiring
inlining is good news for debuggers and interpreters, not to mention
test coverage analysers and profilers.
The second is that inlining doesn't always help efficiency. There's no
point in executing 10% fewer instructions if, thanks to cache effects,
it takes 20% longer to do so. Back in the late 80s at Quintus, a
certain competitor claimed that their native-code Prolog system
generated code that ran twice as fast as ours. So it did, on naive
reverse. On a two-page program (which didn't fit into the 68020's
on-chip instruction cache), it did no better than our threaded-code
system. On a real program, it was twice as slow. I have heard that the
same phenomenon occurred with BEAM -vs- JAM. Inlining can have similar
effects, though less severe.
One extension to Erlang would make it easier to inline abstract
patterns, and that is disjunctive guards. Currently, we have
guard ::= ... '(' guard ')'
replace that by
guard ::= ... '(' guard {'|' guard}... ')'
with the obvious semantics. This would be useful anyway.
----------------------------------------------------------------
What's wrong with records?
1 the possibility of naming a pattern
2 field and update notation.
3 keyword and default arguments
4 a very specific data structure
5 a global name space
6 the possibility of unchecked conflicting definitions in that namespace
7 a requirement to use the preprocessor to get weak abstraction
8 syntax that includes what LOOK like pattern matches but aren't.
Some of these are good (1, 2, 3), some of them are less good (4), some
of them are bad (5 and 8) and some of them are very bad (6 and 7).
Pattern abstraction provides (1 and 2) without any of the others.
What's wrong with (4)? First, it means that records actually give us no
abstraction at all; in the Erlang Standard specification, to be a record
of type T is precisely to be a tuple of a particular arity whose first
element is the atom T. Second, if we want to give a name to some other
structure, we are sunk, as records buy us nothing. Here is a realistic
example. Suppose we want to manipulate timestamps. For compatibility
with current Erlang, we might want these timestamps to contain embedded
date and time values. So
#timestamp() -> {_, _}.
#timestamp(DT,TM) -> {DT, TM}.
#timestamp(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.
#time_date(DT) -> {DT,_}.
#time_date(Y,M,D) -> {{Y,M,D},_}.
#time_time(TM) -> {_,TM}.
#time_time(H,I,S) -> {_,{H,I,S}}.
#time_year(Y) -> {{Y,_,_},_}.
#time_month(M) -> {{_,M,_},_}.
#time_day(D) -> {{_,_,D},_}.
#time_hour(H) -> {_,{H,_,_}}.
#time_minute(I) -> {_,{_,I,_}}.
#time_second(S) -> {_,{_,_,S}}.
Eventually, we purge our code of explicit references to the patterns,
#timestamp() -> {_,_,_,_,_,_}.
#timestamp({Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
#timestamp(Y,M,D,H,I,S) -> {Y,M,D,H,I,S}.
#time_date({Y,M,D}) -> {Y,M,D,_,_,_}.
#time_date(Y, M, D) -> {Y,M,D,_,_,_}.
#time_time({H,I,S}) -> {_,_,_,H,I,S}.
#time_time(H, I, S) -> {_,_,_,H,I,S}.
#time_year(Y) -> {Y,_,_,_,_,_}.
#time_month(M) -> {_,M,_,_,_,_}.
#time_day(D) -> {_,_,D,_,_,_}.
#time_hour(H) -> {_,_,_,H,_,_}.
#time_minute(I) -> {_,_,_,_,I,_}.
#time_second(S) -> {_,_,_,_,_,S}.
What do we have here? A change of representation. What is it that
records do not allow you to vary? The representation (except within
very narrow limits).
Just to resume a topic that may not have been clear before, here are the
functions we get for the first of these definition sets.
'#timestamp'() -> error().
'0=timestamp'({_,_}) -> {}.
'!timestamp'({X1,X2}) -> {X1,X2}.
'#timestamp'(DT, TM) -> {DT, TM}.
'2=timestamp'({DT,TM}) -> {DT, TM}.
'!timestamp'({X1,X2}, DT, TM) -> {DT,TM}.
'#timestamp'(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.
'6=timestamp'({{Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
'!timestamp'({{X1,X2,X3},{X4,X5,X6}},Y,D,M,H,I,S) ->
{{Y,M,D},{H,I,S}}.
'#time_date'(DT) -> error().
'1=time_date'({DT,_}) -> DT.
'!time_date'({X2,X1}, DT) -> {DT,X1}.
'#time_date'(Y,M,D) -> {{Y,M,D},_}.
'3=time_date'({{Y,M,D},_}) -> {Y,M,D}.
'!time_date'({X2,X1}, Y, M, D) -> {{Y,M,D},X1}.
'#time_time'(TM) -> error().
'1=time_time'({_,TM}) -> TM.
'!time_time'({X1,X2}, TM) -> {X1,TM}.
#time_time(H,I,S) -> error().
'3=time_time'({_,{H,I,S}}) -> {H,I,S}.
'!time_time'({X1,X2}, H, I, S) -> {X1,{H,I,S}}.
'#time_year'(Y) -> error().
'1=time_year'({{Y,_,_},_}) -> Y.
'!time_year'({X4,X1,X2},X3}) -> {{Y,X1,X2},X3}.
'#time_month'(M) -> error().
'1=time_month'({{_,M,_},_}) -> M.
'!time_month'({X1,X4,X2},X3}, M) -> {{X1,M,X2},X3}.
'#time_day'(D) -> error().
'1=time_day'({{_,_,D},_}) -> D.
'!time_day'({X1,X2,X4},X3}, D) -> {{X1,X2,D},X3}.
'#time_hour'(H) -> error().
'1=time_hour'({_,{H,_,_}}) -> H.
'!time_hour'({X1,{X4,X2,X3}}, H) -> {X1,{H,X2,X3}}.
'#time_minute'(I) -> error().
'1=time_minute'({_,{_,I,_}}) -> I.
'!time_minute'({X1,{X2,X4,X3}}, I) -> {X1,{X2,I,X3}}.
'#time_second'(S) -> error().
'1=time_second'({_,{_,_,S}}) -> S.
'!time_second'({X1,{X2,X3,X4}}, S) -> {X1,{X2,X3,S}}.
So if, for example, we wanted to refer to noon today, or to find what
year it is, we could write
Noon_Today = timestamp:now()#time_time(12,0,0),
This_Year = timestamp:now()#time_year
and these would work whether a timestamp was stored as one tuple or
three. The -record approach cannot be used with the three tuple
version. (It can't be used with the one tuple version either, because
there must be an atom serving as a tag, and it must be the first element
of the tuple, and it must be the same as the record name.)
What about keyword and default arguments (3)? Are they not a benefit of
using records? Yes, they are. But they should be available for all
functions and abstract patterns, not just records. Perhaps it is unfair
to say ``ask an Ada programmer'', because Ada doesn't use data the way
Erlang does. But Ada programs do have lots of procedure and function
calls, just as Erlang does, and Erlang functions need more arguments
than Ada functions, for the same reasons as Prolog, so need keyword and
default arguments more than Ada does. I intend that abstract patterns
should use whatever method is used to provide that feature for general
functions.
Do records really imply a global namespace? Yes, because the record
-module(a).
-record(r, {X,Y,Z}).
...
f(..R..) when record(R, r) ->
-module(b).
-record(r, {X,Y}).
g(R) when record(R, r) -> f(R).
... g(#r{1,2}) ...
It is not clear exactly what the record/2 guard test should do. One
manual says "This test checks that R is a tuple of arity N+1, where N is
the number of fields in the record, and the first element in the tuple
is the" atom naming the record. This means that g/1 WILL believe that
record(R, r) is true, but f/1 will NOT. That does mean that f/1 will not
be tricked into working with something that doesn't match its idea of
what an `r' record looks like, but programmers will find it very
confusing that record(R,r) can succeed in one module and fail in another.
It is also a nuisance that we cannot tell whether something "is a record
of type r" without having a -record declaration in scope. That doesn't
apply to any other guard test.
The real problem comes when a and b agree on the arity of r but disagree
on the argument order or interpretation. Consider again,
-module(a).
-record(r, {X,Y,Z}).
...
f(...#r(X,Y,Z)...) -> ...
-module(b).
-record(r, {Z,Y,X}).
...
f(...#r(Z,Y,X)...)
Either a global namespace for records is enforced (so that the record/2
guard test works consistently), or serious confusion may result. The
snag here is that a record name ought to be `owned' by some module, but
in fact it is `owned' by a preprocessor file.
How do pattern abstractions address this problem? In part, it must be
admitted, they don't. They are weak abstractions; there is no defence
against forgery, and little defence against accident other than adding a
tag, just like records. The advantage comes from the fact that an
abstract pattern is clearly owned by a specific module; there is one
declaration, not multiple copies of a declaration. Abstract pattern
names are module qualified, just like other function names.
There is something that could be done that would make accident unlikely
and forgery slightly harder. Records differ from tuples in that they
are tagged. The tags cannot be module-specific, because the same
declaration may be copied into many modules, with no clue as to which is
the owner. All we do is take the idea of tagging, and bend the existing
module-specific literal syntax (like ?VERSION and ?MODULE) a little bit.
The directive
-unique(name).
would mean that at load time a new ref() would be created and bound to
name, in such a way that the name can be recovered from the ref() for
debugging purposes, and at compile time each occurrence of ?name would
refer to that ref(). (This is like the classic Lisp hack of using a
unique cons cell or function closure as a tag.) Then we could write
-unique(ts).
#timestamp() -> {?ts,_, _}.
#timestamp(DT,TM) -> {?ts,DT, TM}.
#timestamp(Y,M,D,H,I,S) -> {?ts,{Y,M,D},{H,I,S}}.
#time_date(DT) -> {?ts,DT,_}.
#time_date(Y,M,D) -> {?ts,{Y,M,D},_}.
#time_time(TM) -> {?ts,_,TM}.
#time_time(H,I,S) -> {?ts,_,{H,I,S}}.
#time_year(Y) -> {?ts,{Y,_,_},_}.
#time_month(M) -> {?ts,{_,M,_},_}.
#time_day(D) -> {?ts,{_,_,D},_}.
#time_hour(H) -> {?ts,_,{H,_,_}}.
#time_minute(I) -> {?ts,_,{_,I,_}}.
#time_second(S) -> {?ts,_,{_,_,S}}.
This is admittedly clumsy, but it's the idea I want to push, not the
syntax, which is at least local to the owning module. This looks a bit
like the preprocessor, but it only requires recognising ?<id> as a
special form. If we left the question mark off (and the binding to a
ref()) we would have the same insecure approach as records, without the
requirement of a fixed location for the tag.
Thanks to preprocessor use, that insecurity is real. The really nasty
thing happens when module a gets its record declaration from foo.hrl,
module a is compiled and loaded, foo.hrl is changed, and another module
including (the revised version of) foo.hrl is compiled and loaded. We
have a dependency between module a and foo.hrl which is not noticed;
partly because foo.hrl is not a module and is not, as such, loaded, and
partly because the dependency exists even while the code in foo.hrl is
not running.
So we are left with only one thing that abstract patterns cannot do as
well as records or better, than that is keyword and default parameters.
At one time it was expected that records would become new types; this
seems to have been dropped from the current Erlang Standard
specification (0.6); I surmise because of the difficulty of establishing
an `owner'.
There is one more thing that abstract patterns can do that records as
currently defined in Erlang cannot do, and that is enforce type
constraints on components. Consider triangles. We'd like a triangle to
be represented by a `record' containing three floats. Using record
notation, all we can do is write
-record(triangle{A,B,C}).
and hope. Using abstract pattern notation, we can write
#triangle(A,B,C) when float(A), float(B), float(C) -> {A,t,B,C}.
and this will enforce the type constraints on the components. Here is a
sample use of this pattern, to see whether a value of this type is a
plausible triangle.
plausible(#triangle(A,B,C))
when 0 < A, A <= B+C,
0 < B, B <= C+A,
0 < C, C <= A+B
-> true;
plausible(_) -> false.
After translation and inline expansion, plausible/1 becomes
plausible({A,t,B,C}) % check form
when float(A), float(B), float(C), % check types
0 < A, A <= B+C,
0 < B, B <= C+A,
0 < C, C <= A+B
-> true;
plausible(_) -> false.
This ability to propagate type information/checking from the declaration
to the uses, if the programmer wants it, is an important safety feature
that is completely missing from Erlang records, and is obtained without
any new runtime machinery. Note that an Erlang compiler can easily
deduce from the use of A, B, C that they must be some kind of number,
but not which kind, not without interprocedural data flow analysis.
Even without inlining, we get the runtime type checking we choose in the
declaration, which has to be helpful if we want to build large systems.
The End.
The abstract pattern notation neatly replaces both the record
declarations and the use of the preprocessor.
Records let you say
L2 = L#log{blocked_by = none, status = ok}
which is arguably prettier than
L2 = L#log_blocked_by(none)#log_status(ok)
but not hugely so. One could argue that the abstract pattern
version has the merit of not containing a manifestly false
pattern match 'status = ok'.
Records automatically provide you with a full set of record.field
forms; abstract patterns for them have to be separately declared.
On the other hand, records *force* you to accept a full set of
record.field forms even if you want some of them blocked, and it
is not at all clear how import/export of #record.field forms would
be controlled.
Records permit patterns like
#log{status == S, users = N} = L
which is arguably prettier than
#log_status(S) = #log_users(N) = L
but again, the latter never contains manifestly absurd matches.
Dynamic Patterns.
The parallel between abstract patterns and functions can be stretched a
little further. In effect, we can regard a value of type
(T1,...,Tn) => T0
[where => is the pattern arrow in distinction to the function arrow ->] as
a record
( b : (T1,...,Tn) -> T0 % building
u : (T0,T1,...,Tn) -> T0 % updating
d : T0 -> (T1,...,Tn) % decomposing
)
and
X = #p(E1,...,En) ==> X = p.b(E1,...,En)
X = E0#p(E1,...,En) ==> X = p.u(E0,E1,...,En)
#p(P1,...,Pn) = E ==> {P1,...,Pn} = p.d(E)
Since a triple of functions can be passed as a value, why not allow a
pattern to be passed as a value? Let's put this in Erlang terms.
Define
#E0(E1,...,En)
where E0 is not an atom or a module:atom pair to be
#apply(E0, [E1,...,En])
Define
#apply(Mod,E0,EL)
to be
#apply({Mod,E0}, EL)
Now we have to define the use of #apply/2.
X = #apply(MF,L)
where MF = {M,F}, L = [T1,...,Tn}, and M, F are atoms
reduces to X = #M:F(T1,...,Tn), i.e. X = M:'#F'(T1,...,Tn).
where MF is the result of (fun #F/N) executed statically within
module M, L = [T1,...,Tn], and N = n,
reduces to X = #M:F(T1,...,Tn)
X = E0#apply(MF,L)
where MF = {M,F}, L = {T1,...,Tn} and M, F are atoms
reduces to X = E0#M:F(T1,...,Tn) i.e. X = M:'!F'(E0,T1,...,Tn)
where MF is the result of (fun #F/N) executed statically within
module M, L = [T1,...,Tn], and N = n,
reduces to X = Eo#M:F(T1,...,Tn).
#apply(MF, [P1,...,Pn]) = E
where MF is statically a GuardExpression whose value is dynamically
{M,F} for M and F atoms, and [P1,...,Pn] is statically a list
of Patterns,
reduces to #M:F(P1,...,Pn) = E, i.e. {P1,..,Pn} = M:'n=F'(E).
where MF is statically a GuardExpression whose value is dynamically
the result of (fun #F/N) executed statically within a module M,
[P1,...,Pn] is statically a list of Patterns, and N = n,
reduces to #M:F(P1,...,Pn) = E.
proj(P, []) -> [];
proj(P, [#P(X)|Xs[) -> [X|proj(P,Xs)].
#unit_list(X) -> [X].
#unit_tuple(X) -> {X}.
... proj(fun #unit_list/1, L), ...
... proj(fun #unit_tuple/1, LT), ...
By bending explicit fun syntax a little bit, we can even get anonymous
fun #( Pattern... ) when Guard -> Pattern
; #( Pattern... ) when Guard -> Pattern ...
end
e.g.
fun #(X) -> [X|_] end
to get pattern syntax from function syntax, put # in front of the
[Module:]Name, and to get explicit fun syntax, put `fun' `end' around
the clauses and delete the function Name.) It's not that I think
anonymous patterns are wildly useful; the point is that if they weren't
at least thinkable the parallels between patterns and functions would be
sploit, to the detriment of our ability to reason correctly about them.
Try doing that with records!
As a matter of fact, dynamic patterns do something for us that isn't
easy to get any other way. Look at prof/2 again. Is there any
nonsyntactic difference between
proj(P, []) -> [];
proj(P, [#P(X)|Xs]) -> [X|proj(P, Xs)].
and
map(F, []) -> [];
map(F, [X|Xs]) -> [F(X)|map(F, Xs)].
or is it just a piece of `cuteness' that lets you put the function call
on the `wrong' side?
Yes, there is a difference. The function F in map/2 may do anything at
all; it may have side effects, it may allocate arbitrarily large amounts
of storage, it may run for huge amounts of time. The pattern P in
proj/2 must be an abstraction of a pattern-and-guard: it may not send
or receive messages, have any other side effects, allocate huge amounts
of storage, or take unbounded amounts of time. It is forced to be well
behaved in ways that an arbitary function parameter is not. Indeed, we
are sure that the value X it returns in proj/2 is part of the term it
matches, so it really is a projection function. If we want to reason
about the effects of a code fragment, proj/2 is significantly easier to
reason about than map/2. This means that anonymous patterns do have a
proj(fun #(X) -> {X,_,_} end, L)
is trivially well-behaved, whereas
map(fun ({X,_,_}) -> X end, L)
requires more analysis to see its harmlessness. (Of course, checking
higher order functions is never entirely trivial, and it is something of
a pity that they were added to Erlang so hastily. Topic for another
working paper.)
When you've invented a language construct, a good sign that you are onto
something worthwhile is that applications come faster than you have time
to write them down. When I wrote the stuff about dynamic abstract
patterns, I didn't have an application in mind. But I suddently realised
that there was a proposal I thought of last year which had been
languishing
because there was no safe way to do it.
Erlang processes are vulnerable to a denial-of-service attack where a
malicious or erroneous process repeatedly sends "junk mail". The Erlang
documentation recommends that most or all 'receive' commands should
include a wildcard case to discard junk mail, but that has the
disadvantage
that it won't discard junk mail until the process actually executes a
receive. Junk mail can pile up while a process is suspended. If nothing
else, it causes the recipient of the junk mail to pay (in processing and
garbage collection time) to inspect it.
Last year I came up with the idea of a 'message filter'.
set_process_filter(fun (Msg) -> ... true | false end)
would mean that whenever a (local) process send a message to this one,
the message filter function would run IN THE SENDER'S THREAD to see if
the message was ok. If it returned 'false', the message would be
discarded, otherwise it would be retained.
The problem with that is that it makes the sender vulnerable to Trojan
horses; the message filter function could do absolutely anything while
masquerading as the sender. What I needed was a way of declaring and
enforcing that the message filter function would execute in bounded time
with no side effects.
Does that sound familiar? It's basically an abstract pattern.
So here's the new proposal.
set_process_filter(fun #() when Guard -> Pattern end)
If the argument is a pattern expression of arity 0, then
when a message is sent to this process, the pattern will
be executed in the sender's thread and if it matches the
message will be sent and if it fails the message will be
discarded.
set_process_filter(fun #(Result) when Guard -> Pattern end)
If the argument is a pattern expression of arity 1, then
when a message is sent to this process, the pattern will
be executed in the sender's thread, and if it matches,
the Result will be sent, and if it fails, the message
will be discarded.
Of course these forms are only illustrative; there could be any number
of clauses and fun #F/0 or fun #F/1 could be used instead. The use of
set_process_filter/1 would be to allow simple protocol conversions.
Whether this feature is as useful as I think is debatable; Tobbe
argues that the best idea is to conceal processes inside their creating
modules, and I think he's probably right about that. The point is that
dynamic functions whose cost is certainly bounded are a useful thing to
have, and they fall out quite naturally from trying to take abstract
patterns seriously.
unknown
2003-09-24 04:38:57 UTC
Permalink
Luke Gorrie <luke> wrote
[that he likes the Common Lisp type system]

Henry Baker had a paper (available somewhere on the net)
on how it could have been better (better suited to inference).

The really nasty thing about the Common Lisp type system is
that (at least in certain compilation models) if you state that
something has a certain type the compiler will believe you, and
generate code that can crash or do insane things.

Does CMUCL fix that in any way?
unknown
2003-09-24 08:38:15 UTC
Permalink
Post by unknown
[that he likes the Common Lisp type system]
On further thought though, it is not what I want in a "soft-typing"
system for Erlang. It seems that the class of type errors that can be
unambiguously detected in regular Erlang (or Common Lisp) code is
pretty small.

In general I'd estimate about 10% of my time is spent writing code
that manipulates fairly complex data structures and is somewhat
painful to get right. I'm quite interested in nifty (and simple and
reliable) tools to make that code easier to write. I admit I have been
waiting for someone else to say "hey, I tried <erlang type system X>
and it works great!" before seriously looking in to those.

(If the guys at work did a "cvs update" and found a static type
checker and a bunch of funky-looking declarations around the codebase,
I would be pelted with rotten fruit.)
Post by unknown
Henry Baker had a paper (available somewhere on the net)
on how it could have been better (better suited to inference).
"The Nimble Type Inferencer" I presume? Thanks for the tip, I'll give
it a read.
Post by unknown
The really nasty thing about the Common Lisp type system is
that (at least in certain compilation models) if you state that
something has a certain type the compiler will believe you, and
generate code that can crash or do insane things.
Does CMUCL fix that in any way?
Yes. In CMUCL type declarations are treated as assertions that must be
tested, just like guards in Erlang. It then uses type-inference both
to eliminate unnecessary type checks (like that adding two integers
will produce an integer, or adding two numbers between 0 and 255 won't
produce a bignum), and to generate fast type-specialised code. The
Erlang compiler apparently does a bit of this already to run Wings
fast, and I suppose it's really a compiler implementation issue at
heart.

NB: I've read on the CMUCL mailing list that their safety is not
absolutely perfect, but I believe it's very thorough. It's also
possible to explicitly disable type checking using the standard CL
"turn off safety" declaration, giving you fast code that will crash
horribly if it breaks promises about types.

Cheers,
Luke
unknown
2003-10-16 21:06:34 UTC
Permalink
Unfortunately I missed this discussion when it came out but I will give my
opinion now. I don't really know what others have said so I may just be
repeating someone elses opinion.

My first instinctive gut reaction was "Definitely not". It breaks one of
my/our basic principles which I hope we wrote down somewhere.
If we didn'r we definitely should have. That is that you can use the
same pattern on both sides for constructing and destructing a term.
And that can always do it. This should work with all terms including
binaries. This new idea breaks this principle. I can't write:
A = 1,
B = 2,
{A/integer,B/integer} = {A/integer,B/integer},

Well, I can of course, but the meaning of /integer is different on each
side of =. Which is a Bad Thing! As I have said before that as I get
older I value consistency more. Just to be very clear here, it is not
that there is a = here which is important, but this was just a way to
get patterns and terms together concisely.

My second more thought through reaction is still "Defintely not".

Now I will have torea what others have written.

Robert

----- Original Message -----
From: "Kenneth Lundin" <kenneth>
To: <erlang-questions>
Sent: Thursday, September 18, 2003 11:37 AM
Subject: Enhanced type guard syntax]
Post by unknown
Below follows a suggestion regarding enhanced syntax for
writing type guards in function clauses and in other match expressions.
I post this to see if you in the Erlang community have some (positive
or negative) opinions regarding this and I hope I get a lot of feedback.
Enhanced type guard syntax
---------------------------------------
The situation to day is that it is quite cumbersome to write type guards
for a function and that it
is unfair that some type constraints can be expressed directly in the
argument list (list, tuple,Binary)
while other type constraints must be expressed in the guard
(integer,atom,float,pid,port,fun) where the argument names have to be
repeated again.
The idea is originally from Mats Cronqvist AXD.
------------------------
% The constraint that A and B must be integers
% must be expressed in the guard notation with the arguments repeated
%
foo(A, B) when integer(A), integer(B) ->
A * B.
% The constraint that it must be a list (with at least one element)
% can be expressed directly in the argumentlist
%
bar([H|T]) ->
H.
Proposed solution
------------------
-----------------------------------
% The type constraint expressed directly in the argument list
% Shorter to write
% Exactly the same semantics as the foo example above
% The compiler has potential of handling these type constraints more
efficiently than today
% and this syntax makes the different type of guards more visible
%
foo(A/integer, B/integer) ->
A * B.
{X/float, Y/float} = graph:coordinates(),
The rationale behind this suggestion is that we already have the
<<Size,B:Size/binary,Rest/binary>> = <<2,"AB","CD">>)
It is unfair that some types can be matched without guard syntax while
others can't.
It will result in a more compact and intuitive way of writing that will
encourage writing type constraints which will catch errors as early and
efficient as possible. The allowed type specifiers should be all the
type test BIFs named is_XXX/1 but without the "is_" prefix.
Alternative syntax
-------------------
Of course some other special character can be used to distinguish the
type constraint from other tokens and one idea could be to make this as
an extension to the '_' (don't care) notation which then indicates don't
care the value but it should be of a certain type.
foo(A = $integer, B = $integer) ->
A*B.
foo(A = _$integer, B = _$integer) ->
Advantages with this solution could be that it does only introduce the
new variants of typed don't care.
/Regards Kenneth Lundin Product Manager of Erlang/OTP
Loading...