Discussion:
[fpc-devel] Dangerous optimization in CASE..OF
Martok
2017-07-01 22:26:28 UTC
Permalink
Hi all,

continuing the discussion from bug 32079 here, as per request by Jonas. New
thread instead of the DFA one because this is about a concrete issue *with the
optimizer*.

TL;DR: there is an incredibly dangerous, unexpected and unavoidable
level1-optimization in the codegen for case..of, leading to potential arbitrary
code execution in any code that switches on enums.

The full story is going to get a bit long-winded, so grab a cup...


CASE $Ordinal OF is defined in the Reference Manual, 13.2.2 like so:
"""
The compiler will evaluate the case expression. If one of the case constants’
value matches the value of the expression, the statement that follows this
constant is executed. After that, the program continues after the final end.

If none of the case constants match the expression value, the statement list
after the else or otherwise keyword is executed. This can be an empty statement
list. If no else part is present, and no case constant matches the expression
value, program flow continues after the final end.
"""

Depending on the number of case labels, tcgcasenode.pass_generate_code
(ncgset.pas:1070) may choose one of 3 strategies for the "matching" task:
jumptable, jmptree or linearlist. Jmptree and linearlist are basically "lots of
if/else", while jumptable is a computed goto. The address of every possible
label's code block is put into a table that is then indexed in a jmp.
Example for x86, switching on a byte-sized #EXPRESSIONRESULT:
mov #EXPRESSIONRESULT,%al
sub #LOWESTLABEL,%al
and $ff,%eax
jmp *0x40c000(,%eax,4)

What if EXPRESSIONRESULT is larger than the largest case label? To be safe from
that, the compiler generates a range check, so the example above actually looks
like this:
mov #EXPRESSIONRESULT,%al
sub #LOWESTLABEL,%al
cmp (#HIGHESTLABEL-#LOWESTLABEL),%al
ja $#ELSE-BLOCK
and $ff,%eax
jmp *0x40c000(,%eax,4)

This is very fast because modern CPUs will correctly branch-predict the JA and
start caching the jumptable so we effectively get the check for free, and still
way faster than the equivalent series of "if/else" of the other strategies on
old CPUs.

So far, so good. This is where Delphi is done and just emits the code.
FPC however attempts one more optimization (at LEVEL1, so at the same level that
enables jumptables in the first place): if at all possible, the range check is
omitted (which was probably a reasonable idea back when branches were always
expensive). The only criterion for that is if the highest and lowest value of
EXPRESSIONRESULT's type have case labels, ie. if the jumptable will be "full".
This makes sense for simple basetypes like this:
case aByteVar of
0: DoSomething;
// ... many more cases
255: DoSomethingLast;
end;

The most likely case where one might encounter this is however is with
enumerations. Here, the criterion becomes "are the first and last declared
element present as case labels?", and we're no longer necessarily talking about
highest and lowest value of the basetype.

This is fine if (and only if) we can be absolutely sure that the
EXPRESSIONRESULT always is between [low(ENUM)..high(ENUM)] - otherwise %eax in
the example above may be anywhere up to high(basetype)'th element of the
jumptable, loading an address from anything that happens to be located after our
jumptable and jumping there. This is, I cannot stress this enough, extremely
dangerous! I expect not everyone follows recent security research topics, so
just believe me when I say that: if there is any way at all to jump "anywhere",
a competent attacker will find a way to make that "anywhere" be malicious code.

So, to be able to do that optimisation safely (remember, LEVEL1 should be safe
and not change the behaviour) we must be absolutely sure that this cannot happen.
Good thing we're talking about enumerations here, so only declared elements can
ever occur, right? Right!?
The only way to get data with an invalid value in an enum in Pascal is by
using an unchecked (aka explicit) typecast, by executing code without range
checking (assigning an enum from a larger parent enum type into a smaller
sub-enum type), or by having an uninitialised variable.
Turns out this is really not true. There are also as "esoteric" things as using
Read(File). Or TStream.Read. Or the socket implementation of your choice. Or by
calling a library function. There are many ways to have an invalid value in an
enum in any meaningful code. Pretty much everything that is not a direct
assignment from a constant is a potential candidate.

So, the only choice for us is to assume that enumerations are just what they are
in C: fancy constants on a base type. This is, incidentally, exactly the wording
used in the documentation cited above, so if we'd want to play by
'rules-as-written' that is exactly what we should have assumed anyway.
Just compile with {$RANGECHECKS ON}, then!
I've been trying really hard for the past couple of hours, but I haven't gotten
the compiler to emit a single check at all when doing anything with enums. And
even if, a runtime error is usually not what you want. You'd probably want to
tell the user a file is corrupt instead of killing the program...
But that still doesn't mean we have to worry about any of this in the
codegen for CASE..OF - just tell the programmer to manually check their input
after reading from wherever!
Well, yeah, except... there is no way to do that.
if EnumValue in [low(TEnumType)..high(TEnumType)] then
will not work for sparse enums or a basetype larger than Byte, and
case Enumvalue of
All,
Expected,
Values.... : doSomething;
else
raise EFileError.Create('Invalid data');
end;
will obviously also not work because this is just what we're trying to do here
in the first place (NB: this was my original use case).

So, we have a problem here: either the type system is broken because we can put
stuff in a type without being able to check if it actually belongs there, or
Tcgcasenode is broken because it (and _only_ it, as far as I can see) wants to
be clever by omitting an essentially free check for very little benefit.
I know which interpretation I would choose: the one with the easier fix ;-)


I would very much like for someone to at least acknowledge that there is a
problem here, because I can think of several more or less clever fixes for that
(except the obvious) and would prefer discussing these instead of having to
prove that "not-as-defined"-behaviour is not the same as "undefined behaviour".


Kind regards,

Martok
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
Florian Klämpfl
2017-07-02 08:19:39 UTC
Permalink
Post by Martok
Depending on the number of case labels, tcgcasenode.pass_generate_code
jumptable, jmptree or linearlist. Jmptree and linearlist are basically "lots of
if/else", while jumptable is a computed goto. The address of every possible
label's code block is put into a table that is then indexed in a jmp.
mov #EXPRESSIONRESULT,%al
sub #LOWESTLABEL,%al
and $ff,%eax
jmp *0x40c000(,%eax,4)
What if EXPRESSIONRESULT is larger than the largest case label? To be safe from
that, the compiler generates a range check, so the example above actually looks
mov #EXPRESSIONRESULT,%al
sub #LOWESTLABEL,%al
cmp (#HIGHESTLABEL-#LOWESTLABEL),%al
ja $#ELSE-BLOCK
and $ff,%eax
jmp *0x40c000(,%eax,4)
This is very fast because modern CPUs will correctly branch-predict the JA and
start caching the jumptable so we effectively get the check for free, and still
If you read about branch-prediction a little bit more in detail, then you will learn that branch
prediction fails to work well as soon as their is more than one cond. jump per 16 bytes, this
basically applies to all CPUs.
Post by Martok
way faster than the equivalent series of "if/else" of the other strategies on
old CPUs.
So far, so good. This is where Delphi is done and just emits the code.
FPC however attempts one more optimization (at LEVEL1, so at the same level that
enables jumptables in the first place): if at all possible, the range check is
omitted (which was probably a reasonable idea back when branches were always
expensive). The only criterion for that is if the highest and lowest value of
EXPRESSIONRESULT's type have case labels, ie. if the jumptable will be "full".
case aByteVar of
0: DoSomething;
// ... many more cases
255: DoSomethingLast;
end;
The most likely case where one might encounter this is however is with
enumerations. Here, the criterion becomes "are the first and last declared
element present as case labels?", and we're no longer necessarily talking about
highest and lowest value of the basetype.
This is fine if (and only if) we can be absolutely sure that the
EXPRESSIONRESULT always is between [low(ENUM)..high(ENUM)] - otherwise %eax in
the example above may be anywhere up to high(basetype)'th element of the
jumptable, loading an address from anything that happens to be located after our
jumptable and jumping there. This is, I cannot stress this enough, extremely
dangerous! I expect not everyone follows recent security research topics, so
just believe me when I say that: if there is any way at all to jump "anywhere",
a competent attacker will find a way to make that "anywhere" be malicious code.
Indeed. The same problem as with any array on the stack. You have to ensure by any means, that the
index of the array is within the declared range of the array. If you have an array with an enum as
index, you have to ensure that the enum is within the declared range, else you get the same problem
as with the case.
Post by Martok
So, we have a problem here: either the type system is broken because we can put
stuff in a type without being able to check if it actually belongs there, or
Tcgcasenode is broken because it (and _only_ it, as far as I can see) wants to
be clever by omitting an essentially free check for very little benefit.
I know which interpretation I would choose: the one with the easier fix ;-)
Yes, checking the data. I can easily create a similar problem as above with the "range checks" for
the jump table by reading a negative value into the enum. Unfortunately, the checks are unsigned ...

The correct solution is to provide a function which checks an integer based on rtti if it is valid
for a certain enum. Everything else is curing only symptoms.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailma
Michael Van Canneyt
2017-07-02 08:29:24 UTC
Permalink
Post by Florian Klämpfl
Post by Martok
So, we have a problem here: either the type system is broken because we can put
stuff in a type without being able to check if it actually belongs there, or
Tcgcasenode is broken because it (and _only_ it, as far as I can see) wants to
be clever by omitting an essentially free check for very little benefit.
I know which interpretation I would choose: the one with the easier fix ;-)
Yes, checking the data. I can easily create a similar problem as above with the "range checks" for
the jump table by reading a negative value into the enum. Unfortunately, the checks are unsigned ...
The correct solution is to provide a function which checks an integer based on rtti if it is valid
for a certain enum. Everything else is curing only symptoms.
GetEnumName from typinfo will already do this for you.
We could add an additional function that just returns true or false.
Something as
function ValueInEnumRange(TypeInfo : PTypeInfo; AValue : Integer) : boolean;

If memory serves correct, this will not work for enums that have explicitly assigned
numerical values, though.

Michael.
Martok
2017-07-02 12:14:22 UTC
Permalink
Post by Michael Van Canneyt
These cases are without exception covered by the " unchecked (aka explicit)
typecast," part of Jonas's statement. Including Read(File).
Aye, that was kinda my point ;)
It is really hard to write code that interacts with the outside world without
having a validation problem.
If the validation code then breaks because the compiler thinks it's clever...
Post by Michael Van Canneyt
GetEnumName from typinfo will already do this for you.
We could add an additional function that just returns true or false.
Something as
function ValueInEnumRange(TypeInfo : PTypeInfo; AValue : Integer) : boolean;
Enum Typeinfo is horribly broken in so many ways except for the one simple case
needed for published properties, it definitely cannot be used in its current form.


That, probably (not sure about the timeline, but it makes sense to me), is part
of the core issue: Enumeration types have become way more powerful since this
optimization was introduced. Back then, nobody would have translated a C library
enum typedef as an enumerated type - simply because we didn't have sparse enums
then. Now we do, and so it is possible to use the typesafe way -- only that it
turns out to be less safe than a byte variable and some untyped constants.


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/f
Michael Van Canneyt
2017-07-02 13:21:28 UTC
Permalink
Post by Martok
Post by Michael Van Canneyt
These cases are without exception covered by the " unchecked (aka explicit)
typecast," part of Jonas's statement. Including Read(File).
Aye, that was kinda my point ;)
It is really hard to write code that interacts with the outside world without
having a validation problem.
No it is not, see my sample code. You're responsible for making sure that
the values that come from the outside world are valid. The compiler cannot
do that for you.
Post by Martok
If the validation code then breaks because the compiler thinks it's clever...
No, the compiler assumes it, and it alone, controls the possible values.
It acts on that assumption. That is why you have an enum to begin with.

Else you could and should use an integer and a set of constants.
Post by Martok
Post by Michael Van Canneyt
GetEnumName from typinfo will already do this for you.
We could add an additional function that just returns true or false.
Something as
function ValueInEnumRange(TypeInfo : PTypeInfo; AValue : Integer) : boolean;
Enum Typeinfo is horribly broken in so many ways except for the one simple case
needed for published properties, it definitely cannot be used in its current form.
Without a decent explanation, this is a very gratuitous statement, which we
will simply discard as not worthy of a reaction.

So please explain.

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/lis
Yury Sidorov
2017-07-02 09:59:29 UTC
Permalink
Post by Florian Klämpfl
Post by Martok
So, we have a problem here: either the type system is broken because we can put
stuff in a type without being able to check if it actually belongs there, or
Tcgcasenode is broken because it (and _only_ it, as far as I can see) wants to
be clever by omitting an essentially free check for very little benefit.
I know which interpretation I would choose: the one with the easier fix ;-)
Yes, checking the data. I can easily create a similar problem as above with the "range checks" for
the jump table by reading a negative value into the enum. Unfortunately, the checks are unsigned ...
The correct solution is to provide a function which checks an integer based on rtti if it is valid
for a certain enum. Everything else is curing only symptoms.
Indeed, I've done some tests and found out that when range checking is enabled enums are not checked at all. Even array
access with enum index is not checked.
According to docs enums should be range checked:
https://www.freepascal.org/docs-html/prog/progsu65.html#x72-710001.2.65

As Florian has said, the correct solution for this issue is to add range checking for enum types when range checking is
ON. Including the "CASE <enum> OF". The check via RTTI should be fine.
At least you will be able to generate slower but safe code by enabling range checks and overflow checks.

Yury.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/li
Jonas Maebe
2017-07-02 16:20:12 UTC
Permalink
Post by Yury Sidorov
Indeed, I've done some tests and found out that when range checking is
enabled enums are not checked at all. Even array access with enum index
is not checked.
https://www.freepascal.org/docs-html/prog/progsu65.html#x72-710001.2.65
Range checking code is generated for operations involving enums if,
according to the type system, the enum can be out of range. Just like
with integer sub-range types.

E.g., this generates a range check error:

{$r+}
type
tenum = (ea,eb,ec,ed);
tsubenum = eb..ec;
var
arr: array[tsubenum] of byte;
index: tenum;
begin
index:=ed;
writeln(arr[index]);
end.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/m
Ondrej Pokorny
2017-07-02 16:26:19 UTC
Permalink
Post by Jonas Maebe
Range checking code is generated for operations involving enums if,
according to the type system, the enum can be out of range. Just like
with integer sub-range types.
Allow me a stupid question: how to convert an integer to enum with range
checking? A cast does not generate range checking, if I am not mistaken:

program Project1;

type
TEnum = (one, two);

{$R+}
var
I: Integer;
E: TEnum;
begin
I := 2;
E := TEnum(I); // <<< I want a range check error here
Writeln(Ord(E));
end.

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.
Jonas Maebe
2017-07-02 16:28:39 UTC
Permalink
Post by Ondrej Pokorny
Post by Jonas Maebe
Range checking code is generated for operations involving enums if,
according to the type system, the enum can be out of range. Just like
with integer sub-range types.
Allow me a stupid question: how to convert an integer to enum with range
checking?
The current possibilities and possibly improvements have been mentioned
elsewhere in this thread already
* http://lists.freepascal.org/pipermail/fpc-devel/2017-July/038013.html
* http://lists.freepascal.org/pipermail/fpc-devel/2017-July/038014.html


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org
Ondrej Pokorny
2017-07-02 16:43:17 UTC
Permalink
Post by Jonas Maebe
Post by Ondrej Pokorny
Allow me a stupid question: how to convert an integer to enum with
range checking?
The current possibilities and possibly improvements have been
mentioned elsewhere in this thread already
* http://lists.freepascal.org/pipermail/fpc-devel/2017-July/038013.html
* http://lists.freepascal.org/pipermail/fpc-devel/2017-July/038014.html
Thanks, so there is no enumeration range checking from the compiler at
all :/ Everything has to be done manually :/

1.) if (I>=Ord(Low(TMyEnum))) and (I<=Ord(High(TMyEnum))) then

It's long and ugly and it is manual checking that the $RANGECHECKS
directive has no effect to. (Yes, I use it in my code.)

2.) function ValueInEnumRange(TypeInfo : PTypeInfo; AValue : Integer) :
boolean;

This still involves a manual checking.

Another problem: RTTI is not generated for enums with explicit indexes,
if I am not mistaken: TEnum = (two = 2, four = 4).

---

IMO FPC/Pascal lacks an assignment operator for enums with range
checking. Something like:

EnumValue := IntegerValue as TEnum;

Are there any disadvantages of the enum-AS operator that prevents its
introduction?

Ondrej
Jonas Maebe
2017-07-02 16:49:00 UTC
Permalink
Post by Ondrej Pokorny
Thanks, so there is no enumeration range checking from the compiler at
all :/
Yes, there is range checking for enums. No, there is no built-in checked
conversion from integer to arbitrary enumeration types. That's why I
suggested in the bug report that started this thread to file a feature
request for such a conversion.
Post by Ondrej Pokorny
Are there any disadvantages of the enum-AS operator that prevents its
introduction?
Someone else could already have code that overloads the AS-operator in
this way, and such a change would break this (you cannot overload
operators with a built-in meaning). I would be in favour of a new intrinsic.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://li
Ondrej Pokorny
2017-07-02 16:55:56 UTC
Permalink
No, there is no built-in checked conversion from integer to arbitrary
enumeration types. That's why I suggested in the bug report that
started this thread to file a feature request for such a conversion.
Very good :)
Post by Ondrej Pokorny
Are there any disadvantages of the enum-AS operator that prevents its
introduction?
Someone else could already have code that overloads the AS-operator in
this way, and such a change would break this (you cannot overload
operators with a built-in meaning). I would be in favour of a new intrinsic.
If I am not mistaken, the AS operator cannot be overloaded - so no
chance to break legacy code here.

Ondrej
Ondrej Pokorny
2018-04-13 10:52:24 UTC
Permalink
Post by Ondrej Pokorny
No, there is no built-in checked conversion from integer to arbitrary
enumeration types. That's why I suggested in the bug report that
started this thread to file a feature request for such a conversion.
Very good :)
Post by Ondrej Pokorny
Are there any disadvantages of the enum-AS operator that prevents
its introduction?
Someone else could already have code that overloads the AS-operator
in this way, and such a change would break this (you cannot overload
operators with a built-in meaning). I would be in favour of a new intrinsic.
If I am not mistaken, the AS operator cannot be overloaded - so no
chance to break legacy code here.
Because

1.) you agreed that a built-in checked conversion from integer to
arbitrary enumeration type is wanted and
2.) the AS-operator cannot be overloaded and thus the only argument
against enum support for AS is not valid

I introduced the AS operator for enumerators in
https://bugs.freepascal.org/view.php?id=33603

Ondrej
Michael Van Canneyt
2018-04-13 10:55:23 UTC
Permalink
Post by Florian Klämpfl
Post by Ondrej Pokorny
No, there is no built-in checked conversion from integer to arbitrary
enumeration types. That's why I suggested in the bug report that started
this thread to file a feature request for such a conversion.
Very good :)
Post by Ondrej Pokorny
Are there any disadvantages of the enum-AS operator that prevents its
introduction?
Someone else could already have code that overloads the AS-operator in
this way, and such a change would break this (you cannot overload
operators with a built-in meaning). I would be in favour of a new intrinsic.
If I am not mistaken, the AS operator cannot be overloaded - so no chance
to break legacy code here.
Because
1.) you agreed that a built-in checked conversion from integer to arbitrary
enumeration type is wanted and
2.) the AS-operator cannot be overloaded and thus the only argument against
enum support for AS is not valid
I introduced the AS operator for enumerators in
https://bugs.freepascal.org/view.php?id=33603
Nice idea, Ondrej :)

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.fr
Sven Barth via fpc-devel
2018-04-13 12:08:39 UTC
Permalink
No, there is no built-in checked conversion from integer to arbitrary
enumeration types. That's why I suggested in the bug report that started
this thread to file a feature request for such a conversion.
Very good :)
Are there any disadvantages of the enum-AS operator that prevents its
introduction?
Someone else could already have code that overloads the AS-operator in
this way, and such a change would break this (you cannot overload operators
with a built-in meaning). I would be in favour of a new intrinsic.
If I am not mistaken, the AS operator cannot be overloaded - so no chance
to break legacy code here.
Because
1.) you agreed that a built-in checked conversion from integer to
arbitrary enumeration type is wanted and
2.) the AS-operator cannot be overloaded and thus the only argument
against enum support for AS is not valid
I introduced the AS operator for enumerators in
https://bugs.freepascal.org/view.php?id=33603
What about enums with holes?
Also for constants there could be a compile time check.

Regards,
Sven
Ondrej Pokorny
2018-04-13 19:16:01 UTC
Permalink
I introduced the AS operator for enumerations in
https://bugs.freepascal.org/view.php?id=33603
What about enums with holes?
No problem because Low() and High() work with these enums as well. See
the test case project I attached to the bug report - it has a test with
such an enum.
Also for constants there could be a compile time check.
That's true. You are welcome to improve the patch :)

Ondrej
Sven Barth via fpc-devel
2018-04-13 21:16:08 UTC
Permalink
Post by Sven Barth via fpc-devel
I introduced the AS operator for enumerations in
https://bugs.freepascal.org/view.php?id=33603
What about enums with holes?
No problem because Low() and High() work with these enums as well. See the
test case project I attached to the bug report - it has a test with such an
enum.
I wasn't talking about the boundaries. I meant undefined values inside the
enum. If we want such a cast operator to work with such enums as well it
should check for invalid values inside the enum, too. Otherwise the
operator isn't worth it and should be forbidden for such enums.

Regards,
Sven
Ondrej Pokorny
2018-04-13 22:50:04 UTC
Permalink
Post by Sven Barth via fpc-devel
I wasn't talking about the boundaries. I meant undefined values inside
the enum. If we want such a cast operator to work with such enums as
well it should check for invalid values inside the enum, too.
Otherwise the operator isn't worth it and should be forbidden for such
enums.
How can I know what you mean with /"//What about enums with holes?//"/ ? :)

Nevertheless, as I already said in the reply to Martok, there are no
undefined values inside an enum with assigned values. The values only
don't have an alias. See the Delphi docs:

http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Simple_Types_(Delphi)#Enumerated_Types_with_Explicitly_Assigned_Ordinality

/type Size = (Small = 5, Medium = 10, Large = Small + Medium);/

/An enumerated type is, in effect, a subrange whose lowest and highest
values correspond to the lowest and highest ordinalities of the
constants in the declaration. In the previous example, the Size type has
11 possible values whose ordinalities range from 5 to 15. (Hence the
type array[Size] of Char represents an array of 11 characters.) Only
three of these values have names, but the others are accessible through
typecasts and through routines such as Pred, Succ, Inc, and Dec.

/IMO the docs are very clear about it. BTW. the Delphi 7 docs have the
same information:
http://docs.embarcadero.com/products/rad_studio/cbuilder6/EN/CB6_ObjPascalLangGuide_EN.pdf
see page 5-7 and 5-8

Ondrej
////
Martok
2018-04-13 14:16:58 UTC
Permalink
Post by Ondrej Pokorny
I introduced the AS operator for enumerators in
https://bugs.freepascal.org/view.php?id=33603
I'm still not convinced that cementing the FPC-ism of Ada-style high-level enums
is a good idea (and how that is even logically supposed to work with assigned
values), but if we want to go there, something like this feature is absolutely
required (Ada has it).

In that case, off the top of my head, succ/pred, for, bitsizeof and maybe sizeof
need to be fixed as well.
--
Regards,
Martok

Ceterum censeo b32079 esse sanandam.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
Ondrej Pokorny
2018-04-13 19:21:06 UTC
Permalink
Post by Martok
Post by Ondrej Pokorny
I introduced the AS operator for enumerators in
https://bugs.freepascal.org/view.php?id=33603
I'm still not convinced that cementing the FPC-ism of Ada-style high-level enums
is a good idea (and how that is even logically supposed to work with assigned
values), but if we want to go there, something like this feature is absolutely
required (Ada has it).
In that case, off the top of my head, succ/pred, for, bitsizeof and maybe sizeof
need to be fixed as well.
Why? I don't think so. Enums with assigned values are documented to be
valid in the whole range Low..High:

http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Simple_Types_(Delphi)#Enumerated_Types_with_Explicitly_Assigned_Ordinality

/type Size = (Small = 5, Medium = 10, Large = Small + Medium);/

/An enumerated type is, in effect, a subrange whose lowest and highest
values correspond to the lowest and highest ordinalities of the
constants in the declaration. In the previous example, the Size type has
11 possible values whose ordinalities range from 5 to 15. (Hence the
type array[Size] of Char represents an array of 11 characters.) Only
three of these values have names, but the others are accessible through
typecasts and through routines such as Pred, Succ, Inc, and Dec. In the
following example, "anonymous" values in the range of Size are assigned
to the variable X.//
/
Ondrej
Ondrej Pokorny
2018-04-14 07:31:14 UTC
Permalink
Post by Jonas Maebe
I would be in favour of a new intrinsic.
I have to admit that for some usages I would prefer a compiler intrinsic
that returns False instead of raising an exception. Something like:

function TryIntToEnum<T: type of enum>(const AValue: Integer; var
AEnumValue: T): Boolean;

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org
Michael Van Canneyt
2018-04-14 07:59:31 UTC
Permalink
Post by Ondrej Pokorny
Post by Jonas Maebe
I would be in favour of a new intrinsic.
I have to admit that for some usages I would prefer a compiler intrinsic
function TryIntToEnum<T: type of enum>(const AValue: Integer; var
AEnumValue: T): Boolean;
Please, don't use this ridiculous generics syntax.
If it is a compiler intrinsic, then

function TryIntToEnum(T: atype; const AValue: Integer; var AEnumValue: aType): Boolean;

will work, just like typeinfo() or sizeof() works.

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepasc
Ondrej Pokorny
2018-04-14 08:08:41 UTC
Permalink
Post by Ondrej Pokorny
Post by Jonas Maebe
I would be in favour of a new intrinsic.
I have to admit that for some usages I would prefer a compiler
intrinsic that returns False instead of raising an exception.
function TryIntToEnum<T: type of enum>(const AValue: Integer; var
AEnumValue: T): Boolean;
Please, don't use this ridiculous generics syntax. If it is a compiler
intrinsic, then
function TryIntToEnum(T: atype; const AValue: Integer; var AEnumValue: aType): Boolean;
will work, just like typeinfo() or sizeof() works.
The syntax I used was just for illustration. You don't type the <T: type
of enum> part in your own code.

What I wanted to tell was that you can omit the enum type parameter (the
first parameter "T: atype" from your function). The type should be
possible to get directly from AEnumValue: aType:

function TryIntToEnum(const AValue: Integer; var AEnumValue: aType):
Boolean; // aType being an enum type
- If you like this description more :)

Effectively, you should be able to use:
var
  E: TMyEnum;
begin
  if TryIntToEnum(1, E) then

instead of
  if TryIntToEnum(TMyEnum, 1, E) then

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-b

Martok
2017-07-02 12:30:28 UTC
Permalink
Post by Florian Klämpfl
Yes, checking the data. I can easily create a similar problem as above with the "range checks" for
the jump table by reading a negative value into the enum. Unfortunately, the checks are unsigned ...
Actually, fun fact, *fortunately* the checks are unsigned. Having a negative
value (or generally a value before the first element when that has a value
assignment) underflows on the check, and so gets caught by the CMP/JA as well.
Yes, I tried that, your code is safer than you think ;-)

Also, for sparse enums the "gaps" are filled with pointers to else-block, so the
check that is already there turns out to be always safe.

enum = (ela = 5, elb, elc, eld, ele);

1) enum value too small ( = 1):

mov 1,%al
sub 5,%al # al = -4 = $fb
cmp (9-5),%al # $fb > 4
ja $#ELSE-BLOCK # branches
and $ff,%eax
jmp *0x40c000(,%eax,4)

2) enum value in range or in gap (= 7 = elc)

mov 7,%al
sub 5,%al # al = 2
cmp (9-5),%al # 2 <= 4
ja $#ELSE-BLOCK # no branch
and $ff,%eax
jmp *0x40c000(,%eax,4)

3) enum value too large ( = 20)

mov 20,%al
sub 5,%al # al = 15
cmp (9-5),%al # 15 > 4
ja $#ELSE-BLOCK # branches
and $ff,%eax
jmp *0x40c000(,%eax,4)


Same thing on x86_64, where instead of al and eax we use eax and rax, with the
same underflow characteristics.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.f
Michael Van Canneyt
2017-07-02 08:40:44 UTC
Permalink
Post by Martok
Hi all,
The only way to get data with an invalid value in an enum in Pascal is by
using an unchecked (aka explicit) typecast, by executing code without range
checking (assigning an enum from a larger parent enum type into a smaller
sub-enum type), or by having an uninitialised variable.
Turns out this is really not true. There are also as "esoteric" things as using
Read(File). Or TStream.Read. Or the socket implementation of your choice. Or by
calling a library function. There are many ways to have an invalid value in an
enum in any meaningful code. Pretty much everything that is not a direct
assignment from a constant is a potential candidate.
These cases are without exception covered by the " unchecked (aka explicit) typecast,"
part of Jonas's statement. Including Read(File).

If you use Read(File) you are implicitly telling the compiler that the file
only contains valid values for the enum. If you yourself are not sure of this,
you must use file of integer instead.

If you check their definitions, you will see that they all use untyped
buffers to do their work. So all 'type safety' bets are off as soon as
you use one of these mechanisms. This is not only true of enums, but for
every data type.

The correct pascal way is to do

var
I : integer;
M : MyEnum;

begin
MyStream.ReadBuffer(I,SizeOf(I));
if (I>=Ord(Low(TMyEnum))) and (I<=Ord(High(TMyEnum))) then
M:=TMyEnum(I)
else
// error
end

Instead of

MyStream.ReadBuffer(M,SizeOf(M));

Which is inherently not safe, as it uses an untyped buffer.

In essence:
you are on your own as soon as you use external sources of values for enums.

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-d
Martok
2017-07-02 12:00:59 UTC
Permalink
Is this made safe by always having an else/otherwise? If so, could the
compiler at least raise a warning if an enumeration was sparse but there
was no else/otherwise to catch unexpected cases?
Interestingly, not in FPC.

This was also always my intuition that the else block is also triggered for
invalid enum values (the docs even literally say that, "If none of the case
constants match the expression value") - and it *is* true in Delphi. In FPC it
is also mostly true, unless you happen to fall into this optimisation.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listin
Martok
2017-07-02 17:29:50 UTC
Permalink
Post by Martok
This was also always my intuition that the else block is also triggered for
invalid enum values (the docs even literally say that, "If none of the case
constants match the expression value") - and it *is* true in Delphi.
There is a reason why this is true in Delphi: because this is the way it has
been documented in Borland products for at least 25 years!

I have checked with the TP7 language reference (it pays to keep books around),
which defines the following things:
- Enumeration element names are implicitly defined as typed constants of their
enum type
- The enum type is either Byte (<=256 elements) or Word.
- Subrange types are defined as the smallest type that can contain their range
- Case statements execute the statements of the matching case label, or the
else block otherwise

Note that they actually defined enumerations as what I called 'fancy constants'
before.


The Delphi 4 language reference (also in book form, which is a bit more detailed
than what is in the .hlp files) uses more precise language:
- Enumeration element names are implicitly defined as typed constants of their
enum type
- The enum type is either Byte, Word, or Longword, depending on $Z and element
count
- Subrange types are defined as the smallest type that can contain their range
- it is legal to inc/dec outside of a subrange, example from the book:
type Percentile = 1..99;
var I: Percentile;
begin
I:= 99;
inc(I); // I is now 100
So if this is a legal statement, subrange types can contain values outside of
their range. The description in the German version is "Die Variable wird in
ihren Basistyp umgewandelt", the variable becomes its base type.
- Case statements execute *precisely one* of their branches: the statements of
the matching case label, or the else block otherwise

So, in D4, we have enums as fancy constants, subrange-types are not safe (so
enums can also never be), and case statements cannot fail.


FPC's language reference has no formal definition of what enums or subranges
really are, and the same language as TP7 regarding case statements.


So at least in modes TP and DELPHI, the optimisation in question is formally wrong.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepasca
Florian Klämpfl
2017-07-02 17:39:30 UTC
Permalink
Post by Martok
Post by Martok
This was also always my intuition that the else block is also triggered for
invalid enum values (the docs even literally say that, "If none of the case
constants match the expression value") - and it *is* true in Delphi.
There is a reason why this is true in Delphi: because this is the way it has
been documented in Borland products for at least 25 years!
I have checked with the TP7 language reference (it pays to keep books around),
- Enumeration element names are implicitly defined as typed constants of their
enum type
- The enum type is either Byte (<=256 elements) or Word.
- Subrange types are defined as the smallest type that can contain their range
- Case statements execute the statements of the matching case label, or the
else block otherwise
Note that they actually defined enumerations as what I called 'fancy constants'
before.
The Delphi 4 language reference (also in book form, which is a bit more detailed
- Enumeration element names are implicitly defined as typed constants of their
enum type
- The enum type is either Byte, Word, or Longword, depending on $Z and element
count
- Subrange types are defined as the smallest type that can contain their range
type Percentile = 1..99;
var I: Percentile;
begin
I:= 99;
inc(I); // I is now 100
So if this is a legal statement, subrange types can contain values outside of
their range. The description in the German version is "Die Variable wird in
ihren Basistyp umgewandelt", the variable becomes its base type.
- Case statements execute *precisely one* of their branches: the statements of
the matching case label, or the else block otherwise
So, in D4, we have enums as fancy constants, subrange-types are not safe (so
enums can also never be), and case statements cannot fail.
FPC's language reference has no formal definition of what enums or subranges
really are, and the same language as TP7 regarding case statements.
So at least in modes TP and DELPHI, the optimisation in question is formally wrong.
So this means:

var
b : boolean;

begin
b:=boolean(3);
if b then
writeln(true)
else if not(b) then
writeln(false)
else
writeln(ord(b));
end.

writes 3 in delphi?

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/l
Martok
2017-07-02 17:51:02 UTC
Permalink
Booleans are not enums in Delphi (not even ordinals), but their own little
thing. "if boolean_expr" is always a jz/jnz, no matter what. They are defined as
0=FALSE and "everything else"=TRUE

However:

var
b : boolean;
begin
b:=boolean(3);
if b = True then
writeln(true)
else if b = False then
writeln(false)
else
writeln(ord(b));
end.

That writes 3, which is why your should never compare on the boolean lexicals.
Some Winapi functions returning longbool rely on that.

Wait, that was a trick question, wasn't it?
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/
Florian Klämpfl
2017-07-02 17:58:17 UTC
Permalink
Post by Martok
Booleans are not enums in Delphi (not even ordinals),
They are:
http://docwiki.embarcadero.com/Libraries/XE5/en/System.Boolean
Post by Martok
but their own little
thing. "if boolean_expr" is always a jz/jnz, no matter what.
Yes. This is an optimization which is invalid as well if I follow your argumentation. Boolean(3)<>true.
Post by Martok
They are defined as
0=FALSE and "everything else"=TRUE
No, see link above.
Post by Martok
var
b : boolean;
begin
b:=boolean(3);
if b = True then
writeln(true)
else if b = False then
writeln(false)
else
writeln(ord(b));
end.
That writes 3,
Yes. What I wanted to point out: also delphi does optimizations on enums which fails if one feeds
invalid values.
Post by Martok
which is why your should never compare on the boolean lexicals.
Some Winapi functions returning longbool rely on that.
No, longbool is something different (even bytebool is).
Post by Martok
Wait, that was a trick question, wasn't it?
In the sense to point out that also delphi assumes enumeration variables contain always valid values.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailma
Martok
2017-07-02 18:12:58 UTC
Permalink
Post by Florian Klämpfl
http://docwiki.embarcadero.com/Libraries/XE5/en/System.Boolean
That prototype is a recent invention, it wasn't there in older versions. Also
the text sounds quite different somewhere else:
http://docwiki.embarcadero.com/RADStudio/XE5/en/Simple_Types#Boolean_Types
Post by Florian Klämpfl
Yes. What I wanted to point out: also delphi does optimizations on enums which fails if one feeds
invalid values.
Okay, if you want believe that Booleans are enums:

b:=boolean(42);
if not b then
writeln('falsy')
else
writeln('truthy');

Prints truthy. Doesn't crash.



_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.f
Florian Klämpfl
2017-07-02 18:18:47 UTC
Permalink
Post by Martok
Post by Florian Klämpfl
http://docwiki.embarcadero.com/Libraries/XE5/en/System.Boolean
That prototype is a recent invention, it wasn't there in older versions. Also
*sigh* This is the case since pascal was iso standarized.
Post by Martok
http://docwiki.embarcadero.com/RADStudio/XE5/en/Simple_Types#Boolean_Types
Post by Florian Klämpfl
Yes. What I wanted to point out: also delphi does optimizations on enums which fails if one feeds
invalid values.
I do not believe, I know.
Post by Martok
b:=boolean(42);
if not b then
writeln('falsy')
else
writeln('truthy');
Prints truthy. Doesn't crash.
Yes, undefined behavior.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Ondrej Pokorny
2017-07-02 18:16:26 UTC
Permalink
Post by Florian Klämpfl
var
b : boolean;
begin
b:=boolean(3);
if b then
writeln(true)
else if not(b) then
writeln(false)
else
writeln(ord(b));
end.
writes 3 in delphi?
IMO you picked up a Delphi compiler bug/undocumented feature (call it as
you want). "if boolean(3) then A" executes A contrary to the
documentation - the docs say something different then the compiler does.
You should not use it as an argument but create an issue report on
Embarcadero's Quality Central so that they either fix the documentation
or fix the compiler.

Whereas:
case boolean(3) of
True: A;
False: B;
else
C;
end;

is documented to execute C and the compiler executes C => good.

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/
Florian Klämpfl
2017-07-02 17:47:55 UTC
Permalink
Post by Martok
type Percentile = 1..99;
var I: Percentile;
begin
I:= 99;
inc(I); // I is now 100
Forgot the mention:
Tried with $r+ :)?
Post by Martok
So if this is a legal statement,
Well, it is a matter of definition, if a statement causing a rte 201 when compiled with $r+ is a
legal statement ...


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-de
Martok
2017-07-02 19:40:24 UTC
Permalink
Post by Florian Klämpfl
Post by Martok
type Percentile = 1..99;
var I: Percentile;
begin
I:= 99;
inc(I); // I is now 100
Tried with $r+ :)?
That case is also documented. RTE in {$R+}, legal in {$R-}. That also means that
while you could make assumptions about the content in {$R+} (Delphi does not*),
you definitely cannot as soon as there is a single write in {$R-}. A C++
compiler could probably try tracing that using constness of variables and
parameters, but we cannot, and so must be defensive.

*) Even FPC makes no such assumptions in all other instances!

type
TF = 1..25;
var
t: TF;
begin
t:= TF(200);
if t in [1..50] then // tautology!
Writeln('a')
else
writeln('b');

What does that print?
Yeah. As documented.
Check the codegen in R+: the if is still fully generated.
Only tcgcasenode does something else.


Honestly, I still don't understand why we're even having this discussion.
We're not talking about adding a new check - only not leaving one out that is
already there 99% of the time.
We're not talking about standardising some new behaviour - Borland did that
decades ago.
The correct behaviour is already documented in every Pascal language reference
(partly including our own), and is also the intuitive one.

I just don't get it. Why would you sacrifice the runtime safety, or, if you
prefer, the code compatibility, of your compiler over an (arguably wrong in at
least 2 modes) specific technicality of the type system that is adhered to
nowhere else?


Taking a break for now. Grading a thesis starts to sound like good relaxation.

Kind regards,

Martok





_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-deve
Florian Klämpfl
2017-07-02 20:02:05 UTC
Permalink
Post by Martok
Honestly, I still don't understand why we're even having this discussion.
Because it is a fundamental question: if there is any defined behavior possible if a variable
contains an invalid value. I consider a value outside of the declared range as invalid, if it shall
be valid, change the declaration of the type.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fp
Martok
2017-07-03 08:47:58 UTC
Permalink
Good morning!
Post by Florian Klämpfl
Post by Martok
Honestly, I still don't understand why we're even having this discussion.
Because it is a fundamental question: if there is any defined behavior possible if a variable
contains an invalid value. _I consider a value outside of the declared range as invalid_,
(emphasis mine)
And this is where you disagree with Borland's explicit documentation, Borland's
implicit extensions via consistent compiler behaviour, and with at least ISO
7185:1990 (that revision has no concept of range checks, and explicitly allows
all operations other than constant assignment to exceed a subrange type).
If this is what you always had in mind for the FPC dialect, fair enough. It is
your project, after all :-) I shall submit appropriate change requests for the
documentation, as well as for several other simplifications for all other
conditionals except CASE..OF that then become possible. I will also submit
another set of change requests to *not* do that in modes TP, DELPHI and ISO for
code compatibility reasons. Probably a 'modeswitch strictenums' or something
like that.

To remind you: CASE..OF is currently the only statement that casts this concept
into code (grep -R getrange compiler/*). Everything else is consistent and
compatible.


Regards,

Martok

PS: starting a mail with "good morning" looks rather stupid if one then spends
two hours writing it. Hm.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org
Martok
2017-07-05 20:33:39 UTC
Permalink
Hi all,
Post by Florian Klämpfl
Post by Martok
Honestly, I still don't understand why we're even having this discussion.
Because it is a fundamental question: if there is any defined behavior possible if a variable
contains an invalid value. I consider a value outside of the declared range as invalid
So, as this is the core of all this, I have spent the last few days asking
various users of pascal languages in different compilers, intentionally without
telling them what this was about. Not a single one considered out-of-range
ordinal values as something bad (though not terribly useful), especially not
causing undefined behaviour: all assumed that they would continue to behave like
ordinals in comparisons.

Something I hadn't known, and which I find quite funny: that group apparently
includes Anders Hejlsberg, who wrote the original Turbo Pascal compiler and
years later specifically defined C# enums contrary to your assumption. In fact,
this entire thread's topic is an actual example in the language reference:
<https://docs.microsoft.com/en-gb/dotnet/csharp/language-reference/keywords/enum>

I haven't yet told all of them why I asked (one set of answers comes from a
forum thread that I don't want to spoil yet, maybe tomorrow evening), but those
who I asked in private all have at some point written code that relies on that
concept and are "irritated" why that wouldn't work in FPC.

All that seems to leave only one conclusion...


Kind regards,

Martok
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freep
Martok
2017-07-13 19:34:42 UTC
Permalink
Hi all,

any new ideas on this issue?

I've been thinking about this a lot, and I do see where you're coming from.
There is some theoretical advantage in treating enums like that. Only one minor
issue: a language with that interpretation does not appear to be Pascal...

You can find some results of my investigations here:
<https://www.entwickler-ecke.de/viewtopic.php?p=707764#707764>
(German-language forum post, but I know many of the core team are or can read
German anyway; I can provide a translation if you want)

Regardless of whether there may be some argument for this language change, I'm
still a firm believer in "don't surprise the user". There is literally no
precedent that this simplification has ever been done in any Pascal compiler
(quite the contrary), and there is no written hint that FPC does it either.
Basically, if people with some 30-ish years of experience (and always keeping up
In TP und {$R+} würde aValue ausserhalb einen RangeCheckError erzeugen.
In {$R-} nicht, jedenfalls solange der Datentyp nicht überfahren wird {$Z..}.
Demnach sollte also der Sprung in den else-Zweig immer eindeutig definiert sein.
Jede andere Reaktion würde ich für ein Sicherheitsproblem halten, da hätte
Pascal ja keinen Vorteil mehr.
I also read all of ncg*.pas again with respect to range simplifications, and it
turns out that there really is only one instance where we simplify to undefined
behaviour: tcgcasenode. tcginnode just produces the else-branch faster for
x>=high(setbasetype) (without bittests), but is still defined. All others work
with the base integer type only.
Point is: there is really no unrelated side effect at all if we were to align
FPC with all the other Pascals out there.


Kind regards,

Martok



_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/
Ondrej Pokorny
2017-07-14 18:21:39 UTC
Permalink
Post by Florian Klämpfl
Post by Martok
Honestly, I still don't understand why we're even having this
discussion.
Because it is a fundamental question: if there is any defined behavior
possible if a variable
contains an invalid value. I consider a value outside of the declared
range as invalid, if it shall
be valid, change the declaration of the type.
In this case, please fix the compiler so that it doesn't generate
invalid values by default:

program Project1;

{$mode objfpc}{$H+}

type
TMyEnum = (one = 1, two);

TMyClass = class
public
Enum: TMyEnum;
end;

var
O: TMyClass;
begin
O := TMyClass.Create;
case O.Enum of
one: WriteLn('1');
two: WriteLn('2');
else
WriteLn('something wrong :/');
end;
end.

Ondrej
Ondrej Pokorny
2017-07-14 18:46:33 UTC
Permalink
Btw, when compiling this program with default Lazarus Release build mode:

program Project1;

{$mode objfpc}{$H+}

type
TMyEnum = (zero);

function MyFunc(const aEnum: TMyEnum): string;
begin
case aEnum of
zero: Result := '0';
end;
end;

begin
WriteLn(MyFunc(zero));
end.

I get a warning:

Compile Project, Mode: Release, Target: project1.exe: Success, Warnings: 1
project1.lpr(9,1) Warning: function result variable of a managed type
does not seem to be initialized
17 lines compiled, 0.1 sec, 34672 bytes code, 1316 bytes data

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.or
Ewald
2017-07-14 18:52:12 UTC
Permalink
[snip]
Compile Project, Mode: Release, Target: project1.exe: Success, Warnings: 1
project1.lpr(9,1) Warning: function result variable of a managed type does not seem to be initialized
17 lines compiled, 0.1 sec, 34672 bytes code, 1316 bytes data
IIRC that was exactly the example with which the original thread started (`Data flow analysis (dfa) and "case ... of"`).
--
Ewald
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fp
Jonas Maebe
2017-07-15 10:45:11 UTC
Permalink
Post by Ondrej Pokorny
In this case, please fix the compiler so that it doesn't generate
program Project1;
{$mode objfpc}{$H+}
type
TMyEnum = (one = 1, two);
TMyClass = class
public
Enum: TMyEnum;
end;
Classes are explicitly documented to initialise their contents with
zero, just like local variables are explicitly documented to be not
initialised at all. The fact that the bitpattern for zero is a valid
value for most types, does not change the fact that you remain
responsible for intiailising data before use.

That is also why we warn when you use a local ansistring variable
without initialising it first: even though it won't crash your program
(due to underlying needs by the reference counting mechanism), it is
still a logic error. Unfortunately, keeping track of the initialised
status of fields of individual class instances is quite hard, so the
compiler cannot warn you about such cases at this time.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
l***@kluug.net
2017-07-15 18:52:32 UTC
Permalink
Am Sa., Jul. 15, 2017 12:45 schrieb Jonas Maebe :
Classes are explicitly documented to initialise their contents with
zero
Excuse me, but you have a collision in the way you think :)
On one hand, you try to be very fundamental about enums - you say that only declared enum values are valid. And there is no zero value for TMyEnum. TMyEnum is declared as follows:
TMyEnum = (one = 1, two);
TMyEnum is not a number so it cannot be initialized to zero. Because there is no zero. There are only two values, "one" and "two". I don't care about the bit pattern of the enum value - this is an implementation detail for me (that could change in the future just the same as the CASE statement changed).
On the other hand you say, it is documented to be declared to zero. So you say that an enumeration is an integer value with aliases for number values.
Well, you have 2 ways of solving this:
1.) HIGH-LEVEL enumeration: You say that an enumeration can have a value from a strict set of identifiers. In this case, you have to handle it like this in all cases:
1a) EnumValue := TEnum(IntegerValue) has to assign an always valid value to EnumValue (without range checking) or has to raise a range check error (with range checking) if IntegerValue is not allowed in TEnum. It should be the same like when you assign an Int64 value to Integer field - you always get a valid integer (without range cheching) !!!
1b) You have to initialize EnumValue in objects to valid values, whatever the bit pattern is (bit pattern is not in high-level enumerations).
1c) Inc/Dec has to increase/decrease the enum values and not ordinal values:
TEnum = (zero, two=2);
MyEnum := zero;
Inc(MyEnum); -> has to increase MyEnum to two, not 1.
1d) In this case you can leave the CASE optimization as it is.
- OR -
2.) LOW-LEVEL enumeration: You say that an enumeration is an ordinal type with enumeration values being only aliases for underlying ordinal values. An enumeration can have any possible value that is allowed by the ordinal type (Byte, Word, whatever).
=> 1a) + 1b) + 1c) + 1d) are not valid any more.
-----
Conclusion: you did only 1d - so you have done only 1 point from 4 (I may have forgotten some). All in all, the HIGH-LEVEL enumeration approach cannot be used in Pascal at all (you cannot fix 1a-1c) - because of code speed and/or compatibility reasons.
So, IMO the HIGH-LEVEL enum approach is wrong along with the CASE optimization.
I understand what you say about validity of enum values - but you did only the CASE optimization, not other steps that are from the same pot (1a-1c). If you want to leave the CASE optimization, you have to fundamentally change the enum type and fix 1a-1c.
That is also why we warn when you use a local ansistring variable
without initialising it first: even though it won't crash your program
(due to underlying needs by the reference counting mechanism), it is
still a logic error.

Local variables are off-topic.
Ondrej
Jonas Maebe
2017-07-15 19:06:42 UTC
Permalink
Post by l***@kluug.net
On one hand, you try to be very fundamental about enums - you say that
only declared enum values are valid. And there is no zero value for
TMyEnum = (one = 1, two);
TMyEnum is not a number so it cannot be initialized to zero.
I have said from the start that it is possible to store invalid values
in variables through the use of a.o. pointers (which is what the class
zeroing does), explicit typecasts and assembly.
Post by l***@kluug.net
On the other hand you say, it is documented to be declared to zero.
I say that the memory occupied by class instance fields is documented to
be initialsed with the bit pattern for zero. If the bit pattern zero is
not a value for a particular type, using a variable of that type while
it contains that bit pattern is undefined. Just like using a local
variable without first assigning a valid value to it is undefined
Post by l***@kluug.net
So
you say that an enumeration is an integer value with aliases for number
values.
Not anymore than I say that a shortstring is a 2048 bit integer with
aliases for integer values. It's true that eventually everything is
expressed in bits. That is, however, completely besides the point when
it comes to the semantics of a type system.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepas
l***@kluug.net
2017-07-15 19:33:10 UTC
Permalink
Am Sa., Jul. 15, 2017 21:07 schrieb Jonas Maebe :
On 15/07/17 20:52, ***@kluug.net (mailto:***@kluug.net) wrote:
On one hand, you try to be very fundamental about enums - you say that
only declared enum values are valid. And there is no zero value for
TMyEnum. TMyEnum is declared as follows:

TMyEnum = (one = 1, two);

TMyEnum is not a number so it cannot be initialized to zero.

I have said from the start that it is possible to store invalid values
in variables through the use of a.o. pointers (which is what the class
zeroing does), explicit typecasts and assembly.

In this case you must not restrict us to work with invalid values in a deterministic way.

Ondrej
Jonas Maebe
2017-07-15 19:39:18 UTC
Permalink
Post by Jonas Maebe
I have said from the start that it is possible to store invalid values
in variables through the use of a.o. pointers (which is what the class
zeroing does), explicit typecasts and assembly.
In this case you must not restrict us to work with invalid values in a deterministic way.
You can if you always use explicit typecasts to different types and
access everything through pointers and assembly. But then the question
is why you want to use a restrictive type in the first place.

Either you declare a type as only holding a limited set of data when
valid and assume it behaves as such, or you don't. A mixture is the
worst of both worlds: no type safety and unexpected behaviour when
something else assumes the type declaration actually means what is written.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-d
Ondrej Pokorny
2017-07-16 07:12:44 UTC
Permalink
Post by Jonas Maebe
Post by Jonas Maebe
I have said from the start that it is possible to store invalid values
in variables through the use of a.o. pointers (which is what the class
zeroing does), explicit typecasts and assembly.
In this case you must not restrict us to work with invalid values in a deterministic way.
You can if you always use explicit typecasts to different types and
access everything through pointers and assembly. But then the question
is why you want to use a restrictive type in the first place.
Either you declare a type as only holding a limited set of data when
valid and assume it behaves as such, or you don't. A mixture is the
worst of both worlds: no type safety and unexpected behaviour when
something else assumes the type declaration actually means what is written.
The problem is that you yourself force us to the mixture that "is the
worst of both worlds".

On the one hand you say that the compiler can generate invalid values
and on the other hand you say that the compiler can assume the enum
holds only valid values. For now, there is absolutely no range checking
and type safety for enums - so you can't use it as an argument.

You say "you declare a type as only holding a limited set of data when
valid and assume it behaves as such" - yes I declare it as such but the
compiler itself stores invalid data there. The compiler cannot assume
the data holds only valid values if it happily stores invalid values itself!

Don't you understand the difference between compiler-point-of-view and
the programmer-point-of-view?

Compiler layer:
- stores invalid enumeration values -> it cannot assume there are no
invalid values

Programmer layer (two options - his decision):
1.) he checks all values in the enums himself and does range checking
manually -> he and only he (not the compiler) can assume there are no
invalid values.
2.) he doesn't do manual range checking -> he cannot assume there are no
invalid values.

Again, you have two options:

1.) Give us full type safe enums with full range checking that CANNOT
hold invalid values after any kind of operation (pointer, typecast,
assembler ...). Then I am fully with you: keep the case optimization as
it is (and introduce more optimizations).

2.) Keep the enums as they are (not type safe) and don't do any
optimizations on type safety assumptions on the compiler level. Because
there is no type safety.

From my knowledge, the (1) option is utopia in a low-level languages
along with Pascal.

For reference GNU-C:
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf page 11:
*"An enumeration is a custom data type used for storing constant integer
values and referring to them by names."**
*
Pascal stores the enumeration values the same as C. It doesn't make
sense to handle enums like you want them (which would make sense in
high-level programming languages that Pascal is not).

Ondrej
Michael Van Canneyt
2017-07-16 09:07:12 UTC
Permalink
Post by Jonas Maebe
I have said from the start that it is possible to store invalid values
in variables through the use of a.o. pointers (which is what the class
zeroing does), explicit typecasts and assembly.
In this case you must not restrict us to work with invalid values in a
deterministic way.
You can if you always use explicit typecasts to different types and access
everything through pointers and assembly. But then the question is why you
want to use a restrictive type in the first place.
Either you declare a type as only holding a limited set of data when valid
and assume it behaves as such, or you don't. A mixture is the worst of both
worlds: no type safety and unexpected behaviour when something else assumes
the type declaration actually means what is written.
The problem is that you yourself force us to the mixture that "is the worst
of both worlds".
On the one hand you say that the compiler can generate invalid values and on
the other hand you say that the compiler can assume the enum holds only valid
values. For now, there is absolutely no range checking and type safety for
enums - so you can't use it as an argument.
You are missing the point of an enumerated.

The whole point of using an enumerated is that range checking
*is not necessary*, because the values are 'by definition' correct.

If the compiler cannot assume this, you're just using an integer with
some named values. Hardly worth a separate type.

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin
Ondrej Pokorny
2017-07-16 18:25:33 UTC
Permalink
Post by Michael Van Canneyt
You are missing the point of an enumerated.
The whole point of using an enumerated is that range checking *is not
necessary*, because the values are 'by definition' correct.
If the compiler cannot assume this, you're just using an integer with
some named values. Hardly worth a separate type.
No, I am not missing the point. I try to explain to you that if you
understand enumerated types as strict values from a set, you completely
have to redesign the way they work (see my previous emails) - which is
not doable with current Pascal philosophy.

For now, Pascal enumerated types work as aliases for underlying ordinal
values - a concept that is exactly the same as C enums:

https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf page 11:
*"An enumeration is a custom data type used for storing constant integer
values and referring to them by names."*

page 12:
*"Although such variables are considered to be of an enumeration type,
you can assign them**
**any value that you could assign to an int variable, including values
from other enumerations.**
**Furthermore, any variable that can be assigned an int value can be
assigned a value from**
**an enumeration."

*What you did is you introduced one feature from high-level enumeration
(case optimization) but kept all other features from low-level
enumerations. Do it properly (=break existing code) or don't do it at all.*
*
Well, I have used all arguments I could think of...

Ondrej
Florian Klämpfl
2017-07-16 18:42:38 UTC
Permalink
For now, Pascal enumerated types work as aliases for underlying ordinal values - a concept that is
Very good point:

***@ubuntu64:~$ cat test.cc
#include <stdio.h>

enum tenum { e1,e2,e3,e4,e5,e6,e7,e8 };

int f(tenum e)
{
switch (e)
{
case e1:
printf("Hello 1 %d\n",e1);
return 1;
case e2:
return 354;
case e3:
return 351;
case e4:
return 315;
case e5:
return 35;
case e6:
printf("Hello asdf\n");
return 1;
case e7:
printf("Hello \n");
return 2;
case e8:
printf("Hello\n");
return 3;
}
}

int main()
{
f(tenum(12));
}
***@ubuntu64:~$ clang test.cc
***@ubuntu64:~$ ./a.out
Ungültiger Maschinenbefehl (Speicherabzug geschrieben)
***@ubuntu64:~$ clang test.cc -O3
***@ubuntu64:~$ ./a.out
***@ubuntu64:~$ clang --version
clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

"Ungültiger Maschinenbefehl (Speicherabzug geschrieben)" = Invalid opcode (memory dump written).
Why? Because it does not range check before entering the jump table.

Funnily enough clang does not create crashing code with -O3 as it removes all code :), to get a
crash, compile probably both function separately, the assembler code for f() suggests this.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.free
Ondrej Pokorny
2017-07-16 19:08:48 UTC
Permalink
Post by Florian Klämpfl
"Ungültiger Maschinenbefehl (Speicherabzug geschrieben)" = Invalid
opcode (memory dump written).
Why? Because it does not range check before entering the jump table.
OK, I confess I am not a C guy (I hated it in the university even more
than fortran). But for me:

default-***@ondrej-linux:~/ctest$ cat test.c
#include <stdio.h>

typedef enum { e1,e2,e3,e4,e5,e6,e7,e8 } tenum;

int f(tenum e)
{
switch (e)
{
case e1:
printf("Hello 1 %d\n",e1);
return 1;
case e2:
return 354;
case e3:
return 351;
case e4:
return 315;
case e5:
return 35;
case e6:
printf("Hello asdf\n");
return 1;
case e7:
printf("Hello \n");
return 2;
case e8:
printf("Hello\n");
return 3;
default:
printf("default");
return 0;
}
}

int main()
{
f(12);
}

***@ondrej-linux:~/ctest$ gcc test.c
***@ondrej-linux:~/ctest$ ./a.out
***@ondrej-linux:~/ctest$

No error if I remove the default statement.

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepas
Florian Klaempfl
2017-07-16 19:13:06 UTC
Permalink
Post by Ondrej Pokorny
Post by Florian Klämpfl
"Ungültiger Maschinenbefehl (Speicherabzug geschrieben)" = Invalid
opcode (memory dump written).
Why? Because it does not range check before entering the jump table.
OK, I confess I am not a C guy (I hated it in the university even more
...
Post by Ondrej Pokorny
No error if I remove the default statement.
Indeed. This proves the exactly point. Undefined behaviour. The code
behaves randomly dependent on the compiler used and even the optimizer
switches.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listin
Ondrej Pokorny
2017-07-16 21:36:43 UTC
Permalink
Post by Florian Klaempfl
Indeed. This proves the exactly point. Undefined behaviour. The code
behaves randomly dependent on the compiler used and even the optimizer
switches.
OK, I see now: there is a difference between C enums and C++ enums. Your
example was about C++ enums. My example was about C enums. The C enums
are defined to allow any integer value, whereas C++ enums are strongly
typed.

C enums: https://msdn.microsoft.com/en-us/library/whbyts4t.aspx
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
C++ enums: https://msdn.microsoft.com/en-us/library/2dzy4k6e.aspx

I have to admit that Pascal enums are like C++ enums (strong typed) and
not like C enums. So yes, you are right.

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lis
Martok
2017-07-16 22:04:01 UTC
Permalink
Post by Ondrej Pokorny
OK, I see now: there is a difference between C enums and C++ enums. Your
example was about C++ enums. My example was about C enums. The C enums
are defined to allow any integer value, whereas C++ enums are strongly
typed.
In the pages cited, there's no mention of valid ranges, only that for C++ enums,
you need static_cast<>, and that "In the original C and C++ enum types, the
unqualified enumerators are visible throughout the scope in which the enum is
declared. In scoped enums, the enumerator name must be qualified by the enum
type name." That's just our $SCOPEDENUMS switch.
So it really is undefined and clang takes the unsafe option. Sounds familiar.
Post by Ondrej Pokorny
C enums: https://msdn.microsoft.com/en-us/library/whbyts4t.aspx
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
C++ enums: https://msdn.microsoft.com/en-us/library/2dzy4k6e.aspx
C# enums:
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/enum

"A variable of type Days can be assigned any value in the range of the
underlying type; the values are not limited to the named constants."

I mentioned that in passing before, if we take reference from a C-style
language, we should probably use one that shares more ideas (and the lead designer).
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://
Martok
2017-07-16 19:31:03 UTC
Permalink
You (Florian) do realize that it's almost impossible to write a C++ program that
is not technically undefined? Their 'standards' are worse than our
'implementation-defined'.

FWIW, GCC agrees with Low-Level Enums, and given that clang regularly catches
hate when their 'optimizations' break stuff like the Linux kernel again...

g++:
40058b: 8b 45 fc mov -0x4(%rbp),%eax
40058e: 83 f8 07 cmp $0x7,%eax
400591: 77 76 ja 400609 <_Z1f5tenum+0x89>
400593: 89 c0 mov %eax,%eax
400595: 48 8b 04 c5 d8 06 40 mov 0x4006d8(,%rax,8),%rax
40059c: 00
40059d: ff e0 jmpq *%rax

g++ -O3:
4005a4: 83 ff 07 cmp $0x7,%edi
4005a7: 77 14 ja 4005bd <_Z1f5tenum+0x1d>
4005a9: 89 ff mov %edi,%edi
4005ab: ff 24 fd 18 07 40 00 jmpq *0x400718(,%rdi,8)

Proving my point that we should aim to be better and safer than C, not worse.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/c
Martok
2017-07-25 17:31:08 UTC
Permalink
As has just been pointed out to me, we all misdiagnosed that example.

TL;DR: you did test undefined behaviour, only a different one. The example
actually proves that clang and gcc agree with the MSDN article in that (even
simple) C++ enums are low-level.


I have verified that Clang/LLVM generates the same code as GCC for the switch
statement itself. It does correctly check for the maximum jmptable index (via
[sub 0x7;ja]) and only then jumps.

LLVM however does generate an UD2(0F0B) trap instruction for several programmer
errors, such as the control flow reaching the end of a non-void function without
return. *That* is why we get a SIGILL when no switch label matches.
In -O1 and -Os, these debug instructions are removed again.

As predicted, there does indeed appear to be a Clang bug: GCC correctly warns
"control reaches end of non-void function", while Clang only emits the UD2
instruction (so it detected it) and does not print the warning.
*snip*
return 2;
printf("Hello\n");
return 3;
}
return -1;
}
int main()
*snip*
Another equivalent solution is to have the return in the switch statement's
default label. No UD2 is emitted then.
It follows that the compiler concludes that the default can be matched, even
when all named elements are listed. Therefore, enum variables may contain
unnamed values. Therefore, C++ enums must be Low-Level. QED.
"Ungültiger Maschinenbefehl (Speicherabzug geschrieben)" = Invalid opcode (memory dump written).
Why? Because it does not range check before entering the jump table.
I really should have noticed that. A jump into nonexecutable memory would be
SIGSEGV, not SIGILL.

--
Martok

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/c
Martok
2017-08-03 10:27:18 UTC
Permalink
Post by Martok
As has just been pointed out to me, we all misdiagnosed that example.
Turns out this is not a new question, there is actually a very thorough
treatment of that very issue on SO:
<https://stackoverflow.com/questions/18195312/what-happens-if-you-static-cast-invalid-value-to-enum-class>

(Continue reading after the mention of CWG 1766, the interesting part is what
"representable range" means)

The simple answer however is: a C++ enum with fixed base type (FPC analogue: any
setting other than {$PACKENUM DEFAULT}), any value of the base type is valid.
C++ enums *without* base type may also have a smaller range than int, but always
a power-of-two. That would mean one could mask with that size (we already
generate the masking sometimes) and make the jumptable larger, saving a Jx at
the expense of a larger table.

So the question boils down to: do we want C-Style Enums to behave like in
C-dialects, or just look like they do?

If we do, there's a very simple solution: by setting a fixed $Z option, the
programmer specifically *chose* Low-Level enums. We could just honour that and
be done with it.

--
Martok

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo
Jonas Maebe
2017-07-16 11:40:32 UTC
Permalink
Post by Ondrej Pokorny
On the one hand you say that the compiler can generate invalid values
and on the other hand you say that the compiler can assume the enum
holds only valid values.
It can assume this because a program's behaviour is only defined if you
initialise your variables first with valid values. This is not just the
case for enums, for any type. If you don't initialise a local longint
variable, the compiler could transform your program as if it was
initialised with -12443, for example. The reason is that the behaviour
of your program is undefined in that case, so any behaviour the compiler
could come up with would be "correct".

The same goes if you use an enum variable that does not contain a valid
value.
Post by Ondrej Pokorny
For now, there is absolutely no range checking
and type safety for enums - so you can't use it as an argument.
There is just as much range checking and type safety for enums as there
is for any other type.
Post by Ondrej Pokorny
1.) Give us full type safe enums with full range checking that CANNOT hold invalid values after any kind of operation (pointer, typecast, assembler ...). Then I am fully with you: keep the case optimization as it is (and introduce more optimizations).
2.) Keep the enums as they are (not type safe) and don't do any optimizations on type safety assumptions on the compiler level. Because there is no type safety.
From my knowledge, the (1) option is utopia in a low-level languages along with Pascal
With your argument, there is no type safety for ansistrings either.
After all, it's very easy to get an ansistring with an invalid initial
value with something like this:

type
trec = record
a: ansistring;
end;
prec = ^trec;
var
p: prec;
begin
getmem(p,sizeof(trec));
p^.a:='abc'; // undefined behaviour
end.

Nevertheless, the compiler never adds code to check whether an
ansistring actually points to valid ansistring data before it uses it.
It simply assumes that you, as a programmer, made sure it is properly
initialised at all times.

The fact that (Borland/FPC-style) Pascal is a low-level language and
includes features like arbitrary explicit/unchecked typecasts, pointers,
inline assembly, and that it does not force initialisation of memory
with values that are valid for the type of the data you will store in
that location (or even require you to define the type of a memory block
when you allocate it), means that as a programmer you are responsible
for upholding your end of the bargain as far as the type-safety is
concerned. It's not a one-way street, and never has been (for not a
single type).

The compiler can check either statically or dynamically whether there
are any potential errors when you implicitly convert values from one
type to another, in case the source may contain values that are invalid
for the target type. The programmer, on the other hand, are responsible
for ensure that all source values are properly initialised. As mentioned
before, this is the the garbage-in, garbage-out principle.

Your argument is therefore unrelated to enum types. On the other hand,
you are, however, correct that as far as base enums specifically are
concerned, it does seem they should be treated like in C (i.e., as a
shorthand for plain constant declaration). You will still have the same
problem with (at least integer, and possibly also enum) subranges though.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-deve
DaWorm
2017-07-16 14:34:18 UTC
Permalink
If the programmer builds a case statement on an enum, that includes all of
the possible enum values, and also includes an else clause, to me it seems
the programmer is implicitly telling the compiler that there is the
possibility of illegal values stored in the enum.

Does the compiler optimize away the else clause in this case? Seems to me
it should not. At least that isn't the behavior I would expect as a user.

Jeff
Post by Jonas Maebe
On the one hand you say that the compiler can generate invalid values and
on the other hand you say that the compiler can assume the enum holds only
valid values.
It can assume this because a program's behaviour is only defined if you
initialise your variables first with valid values. This is not just the
case for enums, for any type. If you don't initialise a local longint
variable, the compiler could transform your program as if it was
initialised with -12443, for example. The reason is that the behaviour of
your program is undefined in that case, so any behaviour the compiler could
come up with would be "correct".
The same goes if you use an enum variable that does not contain a valid
value.
For now, there is absolutely no range checking and type safety for enums -
so you can't use it as an argument.
There is just as much range checking and type safety for enums as there is
for any other type.
1.) Give us full type safe enums with full range checking that CANNOT
hold invalid values after any kind of operation (pointer, typecast,
assembler ...). Then I am fully with you: keep the case optimization as it
is (and introduce more optimizations).
2.) Keep the enums as they are (not type safe) and don't do any
optimizations on type safety assumptions on the compiler level. Because
there is no type safety.
From my knowledge, the (1) option is utopia in a low-level languages along with Pascal
With your argument, there is no type safety for ansistrings either. After
all, it's very easy to get an ansistring with an invalid initial value with
type
trec = record
a: ansistring;
end;
prec = ^trec;
var
p: prec;
begin
getmem(p,sizeof(trec));
p^.a:='abc'; // undefined behaviour
end.
Nevertheless, the compiler never adds code to check whether an ansistring
actually points to valid ansistring data before it uses it. It simply
assumes that you, as a programmer, made sure it is properly initialised at
all times.
The fact that (Borland/FPC-style) Pascal is a low-level language and
includes features like arbitrary explicit/unchecked typecasts, pointers,
inline assembly, and that it does not force initialisation of memory with
values that are valid for the type of the data you will store in that
location (or even require you to define the type of a memory block when you
allocate it), means that as a programmer you are responsible for upholding
your end of the bargain as far as the type-safety is concerned. It's not a
one-way street, and never has been (for not a single type).
The compiler can check either statically or dynamically whether there are
any potential errors when you implicitly convert values from one type to
another, in case the source may contain values that are invalid for the
target type. The programmer, on the other hand, are responsible for ensure
that all source values are properly initialised. As mentioned before, this
is the the garbage-in, garbage-out principle.
Your argument is therefore unrelated to enum types. On the other hand, you
are, however, correct that as far as base enums specifically are concerned,
it does seem they should be treated like in C (i.e., as a shorthand for
plain constant declaration). You will still have the same problem with (at
least integer, and possibly also enum) subranges though.
Jonas
_______________________________________________
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Jonas Maebe
2017-07-16 14:49:44 UTC
Permalink
Post by DaWorm
If the programmer builds a case statement on an enum, that includes all
of the possible enum values, and also includes an else clause, to me it
seems the programmer is implicitly telling the compiler that there is
the possibility of illegal values stored in the enum.
Writing unreachable code never has constituted implicitly telling a
compiler anything in any programming language. At best, you will get a
warning from the compiler that the code is unreachable.

You don't want a compiler to start second-guessing what you might have
meant when you wrote something. Everything must be 100% unambiguous.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/
DaWorm
2017-07-16 16:26:06 UTC
Permalink
Academically that may be true, but in the real world that code wouldn't be
unreachable. I write code that deals with communication protocols all the
time. I can't control what the other side sends. I have two choices.
Write a lot of code to validate each and every element is within the proper
range, or let the handler for each element, that I have to write anyway,
deal with the unexpected values. This is something that comes up a lot
dealing with backwards compatibility where one end of a system is upgraded
and the other is not (think like mobile apps where you cannot force all
phones to switch to the new version simultaneously). A new element will be
added to one side that the other doesn't know about. The older code should
handle this gracefully, and the else of a case is certainly the most
convenient place to do this from the programmer's perspective (or at least
mine anyway).

It can be worked around by casting all parts of the case to integer, but
that leads to ugly code.

I'd write up an example but I'm writing from my phone.

Jeff
Post by Jonas Maebe
Post by DaWorm
If the programmer builds a case statement on an enum, that includes all
of the possible enum values, and also includes an else clause, to me it
seems the programmer is implicitly telling the compiler that there is the
possibility of illegal values stored in the enum.
Writing unreachable code never has constituted implicitly telling a
compiler anything in any programming language. At best, you will get a
warning from the compiler that the code is unreachable.
You don't want a compiler to start second-guessing what you might have
meant when you wrote something. Everything must be 100% unambiguous.
Jonas
_______________________________________________
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Martok
2017-07-16 17:24:15 UTC
Permalink
Does the compiler optimize away the else clause in this case? Seems to me it
should not. At least that isn't the behavior I would expect as a user.
At least it should not do that without telling me. The good thing about case
statements is that they tell me of every other programmer error: missing
elements (if used without else), elements that aren't type-correct, double
elements... but not 'extra' else-blocks.
Academically that may be true, but in the real world that code wouldn't be
unreachable. I write code that deals with communication protocols all the
time. I can't control what the other side sends. I have two choices. Write a
lot of code to validate each and every element is within the proper range, or
let the handler for each element, that I have to write anyway, deal with the
unexpected values.
It can be worked around by casting all parts of the case to integer, but that
leads to ugly code.
That is exactly my discovery use case, and is also why I keep calling this a
remote code execution: it breaks on sensible network-facing code in a scary way.

Example code snippet from libOpenPGP:
<https://pastebin.com/2wsfCXfP>
One of the more obvious places, this pattern repeats all over the project, from
data parsing all the way down to simple enum-to-string for logging
(SignatureTypeToStr, PKAlgorithmToStr).

It's just a coincidence this is currently partly safe (none of the *ToStr
functions are!), if I was to add a label with value pkPrivate110 (because I
already have pkRSA) and implement the elliptic curve signatures, we would be
jumping to attacker-controlled memory locations. We currently do that in all of
the *ToStr-functions, instead of executing the else-blocks.

That code is completely unambiguous and well-defined if we assume Low-Level
Enumerations (which, coming from BP, I obviously always have).


Martok
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-
Ondrej Pokorny
2017-07-16 17:58:10 UTC
Permalink
The good thing about case statements is that they tell me of every
other programmer error: missing elements (if used without else)
Off-topic: how can I enable this compiler hint?

When compiling:

program Project1;

type
TEnum = (one, two);
var
A: TEnum;
begin
A := two;
case A of
one:
begin

end;
end;
end.

I don't get any kind of warning/hint.

Ondrej
Martok
2017-07-16 19:13:29 UTC
Permalink
Post by Ondrej Pokorny
The good thing about case statements is that they tell me of every other
programmer error: missing elements (if used without else)
Off-topic: how can I enable this compiler hint?
Erm, I was referring to the "normal" DFA, ie. for function results or variable
initialization.

type
TEnum = (one, two);

function GetInteger(A: TEnum): Integer;
begin
case A of
one: Result:= 1;
end;
end;

... which for some reason only Warns in -O3, and then it's "wrong" sometimes
too, because DFA assumes that enums are Low-Level enums. That was the other
thread on this list recently.

Yeah. Probably a bad argument, sorry.

Martok
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/
Mattias Gaertner
2017-07-16 18:58:15 UTC
Permalink
On Sun, 16 Jul 2017 10:34:18 -0400
Post by DaWorm
If the programmer builds a case statement on an enum, that includes all of
the possible enum values, and also includes an else clause, to me it seems
the programmer is implicitly telling the compiler that there is the
possibility of illegal values stored in the enum.
IMO the word "illegal" is wrong here.
Usually this is used for future extensions. For example when eventually
the enumtype is extended by a new value. The 'else' part can for
example raise an exception. So the 'else' part is not for handling
"illegal", but not-yet-known values.


Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-deve
Ondrej Pokorny
2017-07-02 17:55:47 UTC
Permalink
- Case statements execute*precisely one* of their branches: the statements of
the matching case label, or the else block otherwise
To support your argument, the current Delphi documentation says the same:

http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Declarations_and_Statements

/Whichever caseList has a value equal to that of selectorExpression
determines the statement to be used. If none of the caseLists has the
same value as selectorExpression, then the statements in the else clause
(if there is one) are executed./

According to Delphi documentation, invalid values should point to the
else clause.

Furthermore, it is OK to use invalid values in caseList as well:

program Project1;

{$APPTYPE CONSOLE}

type
TMyEnum = (one, two);

{$R+}
var
E: TMyEnum;
begin
E := TMyEnum(-1);
case E of
one, two: Writeln('valid');
TMyEnum(-1): Writeln('minus one');
else
Writeln('invalid');
end;
end.

The program above writes 'minus one' in Delphi.

Ondrej
Florian Klämpfl
2017-07-02 18:23:36 UTC
Permalink
Post by Ondrej Pokorny
Post by Martok
- Case statements execute *precisely one* of their branches: the statements of
the matching case label, or the else block otherwise
http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Declarations_and_Statements
/Whichever caseList has a value equal to that of selectorExpression determines the statement to be
used. If none of the caseLists has the same value as selectorExpression, then the statements in the
else clause (if there is one) are executed./
According to Delphi documentation, invalid values should point to the else clause.
program Project1;
{$APPTYPE CONSOLE}
type
TMyEnum = (one, two);
{$R+}
var
E: TMyEnum;
begin
E := TMyEnum(-1);
case E of
one, two: Writeln('valid');
TMyEnum(-1): Writeln('minus one');
else
Writeln('invalid');
end;
end.
The program above writes 'minus one' in Delphi.
And the compiler writes no warning during compilation?

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.free
Ondrej Pokorny
2017-07-02 18:29:01 UTC
Permalink
Post by Florian Klämpfl
And the compiler writes no warning during compilation?
It does indeed.
Post by Florian Klämpfl
Yes, undefined behavior.
I think I got your point :) You are right, sorry for wasting your time.

If we get a convenient way to assign ordinal to enum with range checks,
everything will be fine :)

Ondrej
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/m
Martok
2017-07-02 18:33:42 UTC
Permalink
Post by Ondrej Pokorny
Post by Florian Klämpfl
And the compiler writes no warning during compilation?
It does indeed.
But about something else.
Can we please stop derailing from the main issue here?
Post by Ondrej Pokorny
If we get a convenient way to assign ordinal to enum with range checks,
everything will be fine :)
No it will not, we still can no longer elegantly pass/receive enums to/from
libraries from other compilers.
But at least it would be defined then, so programmers would know this is an
incompatibility.


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepa
Marco van de Voort
2017-07-02 14:48:08 UTC
Permalink
Post by Martok
It is really hard to write code that interacts with the outside world without
having a validation problem.
Then you arguing wrong. Then you don't need validation everywhere, but
something you can call to simply confirm an enum has correct values after
reading.

It is not logical to have heaps of checks littered everywhere if the
corruption happens in a defined place (on load).

Worse, tying it to range check would then have heaps of redundant checking
everywhere, not just enums.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fp
Tomas Hajny
2017-07-02 15:03:52 UTC
Permalink
Post by Marco van de Voort
Post by Martok
It is really hard to write code that interacts with the outside world without
having a validation problem.
Then you arguing wrong. Then you don't need validation everywhere, but
something you can call to simply confirm an enum has correct values after
reading.
It is not logical to have heaps of checks littered everywhere if the
corruption happens in a defined place (on load).
Worse, tying it to range check would then have heaps of redundant checking
everywhere, not just enums.
True. That's why I believe that Read from a (typed) file should perform
such validation - but it doesn't at the moment, as mentioned in my e-mail
in the other thread. :-(

Tomas


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.
Michael Van Canneyt
2017-07-02 15:05:23 UTC
Permalink
Post by Tomas Hajny
Post by Marco van de Voort
Post by Martok
It is really hard to write code that interacts with the outside world without
having a validation problem.
Then you arguing wrong. Then you don't need validation everywhere, but
something you can call to simply confirm an enum has correct values after
reading.
It is not logical to have heaps of checks littered everywhere if the
corruption happens in a defined place (on load).
Worse, tying it to range check would then have heaps of redundant checking
everywhere, not just enums.
True. That's why I believe that Read from a (typed) file should perform
such validation - but it doesn't at the moment, as mentioned in my e-mail
in the other thread. :-(
IMHO it should not.

By declaring it as a File of Enum, you are telling the compiler that it contains only valid enums.


Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://
Tomas Hajny
2017-07-02 16:06:56 UTC
Permalink
Post by Michael Van Canneyt
Post by Tomas Hajny
Post by Marco van de Voort
Post by Martok
It is really hard to write code that interacts with the outside world without
having a validation problem.
Then you arguing wrong. Then you don't need validation everywhere, but
something you can call to simply confirm an enum has correct values after
reading.
It is not logical to have heaps of checks littered everywhere if the
corruption happens in a defined place (on load).
Worse, tying it to range check would then have heaps of redundant checking
everywhere, not just enums.
True. That's why I believe that Read from a (typed) file should perform
such validation - but it doesn't at the moment, as mentioned in my e-mail
in the other thread. :-(
IMHO it should not.
By declaring it as a File of Enum, you are telling the compiler that it
contains only valid enums.
Noone can ever ensure, that a file doesn't get corrupted / tampered with
on a storage medium. In other words, you cannot check your assumption
mentioned above earlier than while reading the file. In this logic, typed
files could never be used in any program, because noone could ever ensure
that these files conform to their stated type before their contents enters
a variable of the declared type (and it should be validated before that
point, because that's exactly the point at which the compiler and possibly
also your program start assuming type safety).

Moreover, using the same Read for reading from a text file _does_ perform
such checks (e.g. when using Read for reading an integer from a text file,
the value read is validated whether it conforms the given type and
potential failures are signalized either as an RTE, or a non-zero IOResult
depending on the $I state).

Tomas


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-dev
Michael Van Canneyt
2017-07-02 16:39:04 UTC
Permalink
Post by Tomas Hajny
Post by Michael Van Canneyt
By declaring it as a File of Enum, you are telling the compiler that it
contains only valid enums.
Noone can ever ensure, that a file doesn't get corrupted / tampered with
on a storage medium.
No-one can ensure a memory location cannot get corrupted either.
Post by Tomas Hajny
Moreover, using the same Read for reading from a text file _does_ perform
such checks (e.g. when using Read for reading an integer from a text file,
the value read is validated whether it conforms the given type and
potential failures are signalized either as an RTE, or a non-zero IOResult
depending on the $I state).
Text files by definition are not type safe. The compiler cannot know what it
contains.

By using file of enum (or any data type), you are explicitly telling the compiler it is OK.
The only exception is reference counted types; the compiler will forbid you
to define

myrecord = record
a : ansistring;
b : integer;
end;

f = file of myrecord;

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/ma
Tomas Hajny
2017-07-02 20:09:52 UTC
Permalink
Post by Michael Van Canneyt
Post by Tomas Hajny
Post by Michael Van Canneyt
By declaring it as a File of Enum, you are telling the compiler that it
contains only valid enums.
Noone can ever ensure, that a file doesn't get corrupted / tampered with
on a storage medium.
No-one can ensure a memory location cannot get corrupted either.
I don't think this is true. The operating system should ensure that no
other process corrupts memory location exclusively used by my program, and
I should make sure that my own program doesn't corrupt it itself.

File is usually not protected to be exclusively used by your own program,
unless it's created by the same program in a locked state and later read
again (still locked) during the same run of that program - let's say that
this pattern isn't a typical use of files, right?
Post by Michael Van Canneyt
Post by Tomas Hajny
Moreover, using the same Read for reading from a text file _does_
perform such checks (e.g. when using Read for reading an integer from
a text file, the value read is validated whether it conforms the given
type and potential failures are signalized either as an RTE, or
a non-zero IOResult depending on the $I state).
Text files by definition are not type safe. The compiler cannot know what
it contains.
I don't talk about the compiler, but about the RTL here.
Post by Michael Van Canneyt
By using file of enum (or any data type), you are explicitly telling the compiler it is OK.
There isn't much difference between telling the compiler that all values
in certain file are of certain type and telling the compiler that the next
value read from that file (text in this case) will conform to certain
type. Both is typed, both should provide means for ensuring type safety
while loading the value to the variable of the given type.

Note that I don't talk about typecasting here, of course, that's something
completely different (and manual checking is absolutely appropriate
there).

Tomas


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listi
Michael Van Canneyt
2017-07-03 06:28:27 UTC
Permalink
Post by Tomas Hajny
Post by Michael Van Canneyt
By using file of enum (or any data type), you are explicitly telling the
compiler it is OK.
There isn't much difference between telling the compiler that all values
in certain file are of certain type and telling the compiler that the next
value read from that file (text in this case) will conform to certain
type. Both is typed, both should provide means for ensuring type safety
while loading the value to the variable of the given type.
I think 'both are typed' is not correct. When reading from a text file, the
RTL is explicitly performing a conversion from text to string/float/integer.

The whole point of "File of enum" is exactly that no such conversion is
necessary, because you know the file contains only enum values.

If you don't know this for sure, then you must use file of integer instead
and check&convert the values.
Post by Tomas Hajny
Note that I don't talk about typecasting here, of course, that's something
completely different (and manual checking is absolutely appropriate
there).
In my opinion, the whole point is moot.
I think this kind of check is entirely unnecessary given the definition, as
I have argued. If that argument fails to convince you, so be it.

If you do implement a check for reading enums (or other types) from a typed file,
then please make sure it is only under $R+

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-
Marco van de Voort
2017-07-02 17:15:55 UTC
Permalink
Post by Tomas Hajny
Post by Marco van de Voort
Worse, tying it to range check would then have heaps of redundant checking
everywhere, not just enums.
True. That's why I believe that Read from a (typed) file should perform
such validation - but it doesn't at the moment, as mentioned in my e-mail
in the other thread. :-(
That slows down needlessly IMHO. Or do you mean only in $R+?

Most will blockread records anyway.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin
Tomas Hajny
2017-07-02 19:56:00 UTC
Permalink
Post by Marco van de Voort
Post by Marco van de Voort
Worse, tying it to range check would then have heaps of redundant
checking
Post by Marco van de Voort
everywhere, not just enums.
True. That's why I believe that Read from a (typed) file should perform
such validation - but it doesn't at the moment, as mentioned in my e-mail
in the other thread. :-(
That slows down needlessly IMHO. Or do you mean only in $R+?
$R+ would be sufficient from my point of view, but I'm not sure if that is
possible, because $R+ is usually in effect in the place of declaration and
the checks would probably need to happen inside the Read implementation
(which is already compiled at that point in time). Unlike to $I, there's
probably no way for $R to provide feedback to the caller which may be used
for checks around the call.
Post by Marco van de Voort
Most will blockread records anyway.
That's exactly the point. If someone uses BlockRead/BlockWrite for higher
performance and thus uses untyped access, he/she has to perform manual
checks as appropriate. If someone decides to use typed files, he/she
probably prefers type safety over performance, but doesn't get either at
the moment. :-(

Tomas


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/list
Marco van de Voort
2017-07-02 20:49:56 UTC
Permalink
In our previous episode, Florian Kl?mpfl said:
[ Charset UTF-8 unsupported, converting... ]
Post by Florian Klämpfl
Post by Martok
Honestly, I still don't understand why we're even having this discussion.
Because it is a fundamental question: if there is any defined behavior possible if a variable
contains an invalid value. I consider a value outside of the declared range as invalid, if it shall
be valid, change the declaration of the type.
_AND_ remove types that can't have reasonably cheap range checks like sparse
enums ? :-)
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-
DaWorm
2017-07-02 21:50:58 UTC
Permalink
I store record data in files with a checksum (usually a CRC). I block read
them into an array buffer and verify the checksum. If it passes, I assign
via typecast the array buffer to a variable of the record type. If I'm the
only one reading and writing the files that is usually enough to handle
drive bit rot, or transfer errors.

If someone else's code can write the data I validate everything either when
reading and assigning to the record type, or occasionally before use. Sure
its slow but it's the only safe thing to do. I wouldn't think of abrogating
that responsibility to the compiler.

Jeff
Post by Marco van de Voort
[ Charset UTF-8 unsupported, converting... ]
Post by Florian Klämpfl
Post by Martok
Honestly, I still don't understand why we're even having this
discussion.
Post by Florian Klämpfl
Because it is a fundamental question: if there is any defined behavior
possible if a variable
Post by Florian Klämpfl
contains an invalid value. I consider a value outside of the declared
range as invalid, if it shall
Post by Florian Klämpfl
be valid, change the declaration of the type.
_AND_ remove types that can't have reasonably cheap range checks like sparse
enums ? :-)
_______________________________________________
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Marco van de Voort
2017-07-13 20:24:44 UTC
Permalink
Post by Martok
Regardless of whether there may be some argument for this language change, I'm
still a firm believer in "don't surprise the user". There is literally no
precedent that this simplification has ever been done in any Pascal compiler
(quite the contrary), and there is no written hint that FPC does it either.
Basically, if people with some 30-ish years of experience (and always keeping up
Personally I think the input validation angle to justify checking enums is
dragged-by-the-hairs. Input validation should be done at the bounderies of
the system, not everywhere.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/f
Martok
2017-07-14 00:40:25 UTC
Permalink
Post by Marco van de Voort
Personally I think the input validation angle to justify checking enums is
dragged-by-the-hairs.
I completely agree with you on that. Although in a different way ;-)

That was just the easily-observable breakage of a common pattern. If anybody
actually read what I wrote after Florian clarified the actual issue, I already
narrowed it down to 'simple' compatibility and self-consistency.

There is a fundamental difference in the type system between a somewhat sensible
(if unexpected) assumption in FPC and a more practical documented definition in
every other Pascal compiler. An assumption that even FPC follows only in this
one single spot.
This is unexpected and breaks unrelated code. That's the problem.


Good night,

Martok

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepas
Jonas Maebe
2017-07-15 10:40:21 UTC
Permalink
Post by Martok
There is a fundamental difference in the type system between a somewhat sensible
(if unexpected) assumption in FPC and a more practical documented definition in
every other Pascal compiler. An assumption that even FPC follows only in this
one single spot.
Several times in this thread I've already given examples in this thread
that this is not true. In several places FPC generates code based on the
assumption that data locations of a particular type (including enums)
will only contain any values other than the ones that are valid for
them. For enums manifests itself a.o. in the absence of generated range
checks in various places (array indexing, assignments), and in
comparisons that get optimised away at compile time because they will
always have the same result at run time according to the type information.

If a data location has a particular type but does not contain a value
that is valid for that type (e.g. because it has not been initialised
with one, or because an invalid value was put there via an explicit type
cast or assembler code), then the result is undefined. Note that
"undefined" does not mean "the code will crash". It is one possibility,
but in the general sense it means "anything could happen".


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-
Martok
2017-07-15 15:17:18 UTC
Permalink
Post by Martok
There is a fundamental difference in the type system between a somewhat
sensible (if unexpected) assumption in FPC and a more practical documented
definition in every other Pascal compiler. An assumption that even FPC
follows only in this one single spot.
Several times in this thread I've already given examples in this thread that
this is not true.
And several times in this thread I've shown that the places you mention will
behave the same whether we have strict enums or not - they are correct for
either interpretation, simply by doing what a developer without knowledge of the
specific compiler internals, but with solid knowledge of the language has
come to expect.

For example, if I index an array, I know bad things may happen if I don't check
the index beforehand, so I must always do that.
That if the compiler makes up the array access somewhere along the way sometimes
no check happens is not very predictable.
and in comparisons that get optimised away at compile time because they will
always have the same result at run time according to the type information.
I've shown that is not the case for the more obvious expressions in the forum
post linked above.
Several different ways of writing the (apparent) tautology "is EnumVar in
Low(EnumType)..High(EnumType)" all handle out-of-range-values (expressly, not as
a side effect of something else). Which is especially noteworthy because with
strict enums, we might as well drop the elseblock entirely and warn "unreachable
code" in these tests.
If a data location has a particular type but does not contain a value that is
valid for that type (e.g. because it has not been initialised with one, or
because an invalid value was put there via an explicit type cast or assembler
code), then the result is undefined. Note that "undefined" does not mean "the
code will crash". It is one possibility, but in the general sense it means
"anything could happen".
Absolutely true.
However, FPC does not have the luxury of being the first to define and implement
a new language (well, except for $mode FPC and ObjFPC). There is precedent. And
that precedent is Conclusion 1 of my post above: Enums are handled as a
redefinition of the base type with constants for the names. Some intrinsics
(pred/succ) and the use of the type itself (array[TEnumType], set of) use the
enum-ness for something, most don't. There is nothing undefined.
Do not confuse the additional treatment added by {$R+} with the basic defined
behaviour.


Martok
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-
Jonas Maebe
2017-07-15 16:38:08 UTC
Permalink
Post by Martok
For example, if I index an array, I know bad things may happen if I don't check
the index beforehand, so I must always do that.
No, you don't always have to do that. That is the whole point of a type
system.
Post by Martok
That if the compiler makes up the array access somewhere along the way sometimes
no check happens is not very predictable.
Array indexation is just a side-effect. The basic thing is this:

{$r+}
type
tenum = (ea,eb,ec,ed,ef,eg);
tsubenum = eb..ef;
tsubenum2 = ec..ef;
var
a: tsubenum;
b: tsubenum2;
begin
b:=tsubenum2(eg);
a:=b;
end.

This will never generate a range check error, because the type
information states that a tsubenum2 value is always a valid tsubenum
value. Array indexing a special case of this, as semantically the
expression you use to index the array is first assigned to the range
type of the array.

I would assume that this is something that "someone with a solid
knowledge of the language" would expect.
Post by Martok
and in comparisons that get optimised away at compile time because they will
always have the same result at run time according to the type information.
I've shown that is not the case for the more obvious expressions in the forum
post linked above.
Several different ways of writing the (apparent) tautology "is EnumVar in
Low(EnumType)..High(EnumType)" all handle out-of-range-values (expressly, not as
a side effect of something else).
The in-expression may indeed handle this, but plain comparisons are
removed at compile-time:

type
tsubrange = 6..8;
var
a: tsubrange;
begin
a:=tsubrange(10);
if a>8 then
writeln('this statement is removed at compile-time, because a > 8
is impossible according to the type information');
end.

It seems we don't do this transformation for enums right now (and only
for integer subtypes), but that's a limitation of the implementation
rather than something that is done by design. And the principle is the same.
Post by Martok
Which is especially noteworthy because with
strict enums, we might as well drop the elseblock entirely and warn "unreachable
code" in these tests.
Indeed, just like the removal of the comparison above generates a warning.
Post by Martok
However, FPC does not have the luxury of being the first to define and implement
a new language (well, except for $mode FPC and ObjFPC). There is precedent.
At least the precedent in ISO Pascal
(http://www.standardpascal.org/iso7185rules.html) is that you cannot
convert anything else to an enum, and hence an enum by design always
contains a value that is valid for that type (unless you did not
initialise it all, in which case the result is obviously undefined as well).

And for subranges, it says "It is an error to assign a value outside of
the corresponding range to a variable of that type". Using subrange
values to calculate something else does promote it to the integer type,
but we do that too.

The Extended Pascal standard
(http://www.eah-jena.de/~kleine/history/languages/iso-iec-10206-1990-ExtendedPascal.pdf)
says that enumeration and subrange types are "non-bindable". This means
that they cannot be used with input/output (including files; this avoids
the issue you mentioned with reading invalid values from disk). It does
not really say much else about enumerated types specifically, but they
are of course also ordinal types and for those it says in the section
about Assignment-compatibility (6.4.6):

***
A value of type T2 shall be designated assignment-compatible with a type
T1 if any of the following six statements is true:
...
d) T1 and T2 are compatible ordinal-types, and the value of type T2 is
in the closed interval specified by the type T1.
...
At any place where the rule of assignment-compatibility is used
a) it shall be an error if T1 and T2 are compatible ordinal-types and
the value of type T2 is not in the closed interval specified by the type
T1;
***

That seems pretty clear in terms of stating that having value that is
outside the range of a type is an error. And error is defined as:

***
A violation by a program of the requirements of this International
Standard that a processor is permitted to leave undetected.
***

I.e., undefined behaviour.

It does say that the "range-type" of a subrange-type is the "host-type",
but this range-type is only referenced in very specific contexts, like
when defining assignment compatibility (in a non-quoted part of section
6.4.6 above), and when defining how for-loops must behave (which is a
place were FPC is in fact in error:
https://bugs.freepascal.org/view.php?id=24318 )
Post by Martok
And
that precedent is Conclusion 1 of my post above: Enums are handled as a
redefinition of the base type with constants for the names. Some intrinsics
(pred/succ) and the use of the type itself (array[TEnumType], set of) use the
enum-ness for something, most don't. There is nothing undefined.
Do not confuse the additional treatment added by {$R+} with the basic defined
behaviour.
{$r+} can help with detecting when undefined behaviour would otherwise
occur, like when assigning a value that is out-of-bounds to a subrange
type or an enum. Explicit typecasting disables this aid. It does not
remove the undefined behaviour.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/
Martok
2017-07-15 19:34:38 UTC
Permalink
Post by Jonas Maebe
This will never generate a range check error, because the type
information states that a tsubenum2 value is always a valid tsubenum
value. Array indexing a special case of this, as semantically the
expression you use to index the array is first assigned to the range
type of the array.
I would assume that this is something that "someone with a solid
knowledge of the language" would expect.
Probably. Subranges are after all explicit subsets of something. But let's not
digress, right? That's not related to the topic at hand.
Post by Jonas Maebe
but plain comparisons are removed at compile-time:*With* a warning. Two, actually.
Post by Martok
However, FPC does not have the luxury of being the first to define and implement
a new language (well, except for $mode FPC and ObjFPC). There is precedent.
At least the precedent in ISO Pascal
(http://www.standardpascal.org/iso7185rules.html) is that you cannot
convert anything else to an enum, and hence an enum by design always
contains a value that is valid for that type (unless you did not
initialise it all, in which case the result is obviously undefined as well).
I know this website, turns out that's not quite what ISO7185 says. The ISO is
awfully unspecific about what you can or cannot do with enums. They simply
define enumerated types as defining a set of constants with values 0,1,2 etc.,
and later the compatibility-rules you cite below.

But even if we take the web version:
"""Enumerated types are fundamentally different from integer and subrange types
in the fact that they cannot be freely converted to and from each other."""

'fundamentally different from [...] subrange types' - what I said above.

'cannot be freely converted to and from *each other*' - what they mean by that
is that

type y = (red, green, blue);
type day = (mon, tue, wed, thur, fri, sat, sun);
var
color: y;
begin
color:= fri;
end.

will not work. I don't think anyone would want that ;-)

In any case, we have mode ISO for being extra-ISO-compatible - there are some
significant differences between Borland Pascal and ISO/IEC already. Probably
that mode should also receive the "non-bindable" limitation you cite from
IEC10206. <off-topic>I just noticed: case..else should be a syntax error there,
it doesn't exist in ISO7185 and should be case..otherwise in IEC10206 - where it
is technically mandatory, because a non-matching argument is a dynamic-violation
(RTE).</off-topic>

We also have modes TP and Delphi, and at least there it is *not* an error to
have an unnamed value in a variable, because (spoken in terms of the ISO) the
ordinal-type of an enumerated-type *is* the base type, not a (potentially
non-consecutive) subrange. I've quoted the relevant parts of the language
references multiple times already.
Low/High (and the compiler-internal analogue of them - cf. function getrange()
in FPC) produce the first/last element, but that's it - for example Pred/Succ
may produce unnamed elements.

type
TT = (a=2,b,c=7,d,e); // defines constants of type TT for 2,3,7,8,9
{$R+}
var
t: TT;
begin
t:= b;
t:= succ(t);
Writeln(ord(t)); // writes '4'
end.

Note that FPC doesn't accept this code in mode (Obj)FPC, but correctly does so
in DELPHI, with the same result as Delphi.


Added after Ondrej's message 20:52: Borland appears to have taken the route of
what he called a 'LOW-LEVEL enumeration' from the very beginning.


Martok




_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.f
Jonas Maebe
2017-07-16 11:17:03 UTC
Permalink
Post by Martok
Post by Jonas Maebe
This will never generate a range check error, because the type
information states that a tsubenum2 value is always a valid tsubenum
value. Array indexing a special case of this, as semantically the
expression you use to index the array is first assigned to the range
type of the array.
I would assume that this is something that "someone with a solid
knowledge of the language" would expect.
Probably. Subranges are after all explicit subsets of something. But let's not
digress, right? That's not related to the topic at hand.
I mentioned it because we use exactly the same reasoning for subrange
types as the one you consider completely unacceptable for enumeration
types. Subrange types also consider only part of bit patterns that can
be represented in their allotted memory space as valid, and any other
value stored in there results in undefined behaviour. The case-statement
optimisation happens for those types in exactly the same way, even
though you can also force invalid values into them.

And you also have subranges of enum types. Can any assumptions made
about those in your opinion?

But I finally understand where the disconnect comes from. I have always
thought of enums as more or less equivalents of subrange types, simply
with an optional name for the values. You, and indeed the Pascal
standards, treat them differently.

Does that mean that you would consider the same transformation of a
case-statement when using a subrange type as correct? And that putting a
value outside the range of a subrange into such a variable as a
programmer error? (as opposed to doing the same with a non-subrange enum
type?)
Post by Martok
In any case, we have mode ISO for being extra-ISO-compatible - there are some
significant differences between Borland Pascal and ISO/IEC already.
That is true. But I consider this behaviour a fundamental part of a
type-safe language like Pascal. You want to make the language less
strictly typed in order to have defined behaviour when using code that
puts invalid values into variables.
Post by Martok
Note that FPC doesn't accept this code in mode (Obj)FPC, but correctly does so
in DELPHI, with the same result as Delphi.
Maybe we can indeed change it in TP/Delphi modes, although I still think
it is defeats the purpose of using enums in the first place.

And it would also require us to conditionalise every future optimisation
based on type, in particular separating the treatment of enums from that
of integers. That's a lot of (future) work and care to deal with what I
still consider to be bad programming.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-d
Martok
2017-07-16 16:43:02 UTC
Permalink
Post by Jonas Maebe
And you also have subranges of enum types. Can any assumptions made
about those in your opinion?
Does that mean that you would consider the same transformation of a
case-statement when using a subrange type as correct? And that putting a
value outside the range of a subrange into such a variable as a
programmer error? (as opposed to doing the same with a non-subrange enum
type?)
Depends on the compiler version sadly :/

Subranges for TP5 are documented as "don't rely on anything at runtime, we only
check compiletime", TP7 documents "outside range is an RTE (independent of $R
state)", Delphi is documented like TP5 again.

My intuition was shaped by learning the language with D4 (and D3 books), but
I've always thought that as weird and makes subranges a bit pointless.

I would think that
type
TEnum = (a,b,c);
TSubEnum = a..c;

should have the same semantics, but at the same time they can't if subranges are
strict and enums are not. I see now where you're coming from.
(I'll get back to that example at the end.)

And then there's bitpacked records...
Post by Jonas Maebe
But I finally understand where the disconnect comes from. I have always
thought of enums as more or less equivalents of subrange types, simply
with an optional name for the values. You, and indeed the Pascal
standards, treat them differently.
Getting back to the terms Ondrej introduced yesterday, I think that "normal"
enums may or may not be High-Level enumerations, but enums with explicit
assigment can *only* be Low-Level enumerations. Can we safely distinguish them
in the compiler? Does it even make sense to add that complexity?

This gets weirder. I think Borland already made that distinction, but... not?
<http://docwiki.embarcadero.com/RADStudio/XE5/en/Simple_Types#Enumerated_Types_with_Explicitly_Assigned_Ordinality>

"""An enumerated type is, in effect, a subrange whose lowest and highest values
correspond to the lowest and highest ordinalities of the constants in the
declaration. [...] but the others are accessible through typecasts and through
routines such as Pred, Succ, Inc, and Dec."""

So that's about the "gaps": they're valid, just unnamed.
But for subranges, they write:

"""incrementing or decrementing past the boundary of a subrange simply converts
the value to the base type."""
So we can also leave the min..max range and transparently drop to the parent
type. This raises in $R+, _but is valid otherwise_. (* This is the exact same
text as in the TP5 langref *)

Logical conclusion from that: a variable of a subrange of a
1) High-Level enum becomes invalid when we leave the declared enum elements
2) Low-Level enum remains valid by way of dropping to the base type.
Having both variants in the type system is too complex IMO - although it would
be something where the programmer clearly has to state her intentions.


My initial proposed trivial solution was to keep this undefined (maybe document
the difference to BP), and simply change codegen to be undefined-safe normally
and only undefined-unsafe in -O4. I am, however, no longer so sure if that is
really a good solution.

There has to be a reason why everybody else chose Low-Level enums, except that
it is far simpler to implement, right?
Post by Jonas Maebe
And it would also require us to conditionalise every future optimisation
based on type, in particular separating the treatment of enums from that
of integers. That's a lot of (future) work and care to deal with what I
still consider to be bad programming.
Delphi optimizes always based on the full-range base type:

type
TB = (a,b,c,d,e); // Sizeof(TB)=1
TT = a..e;
var
t: TT;
begin
t:= TT(2);
if t <= e then // does not get removed
if Ord(t) <= 255 then // 'Condition is always true'




Martok
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
Jonas Maebe
2017-07-17 21:50:27 UTC
Permalink
Post by Martok
I would think that
type
TEnum = (a,b,c);
TSubEnum = a..c;
should have the same semantics, but at the same time they can't if subranges are
strict and enums are not. I see now where you're coming from.
(I'll get back to that example at the end.)
And then there's bitpacked records...
Indeed, and range checking. I mean, if you have the above declarations,
then what should be the behaviour in the following cases:

1)

{$r+}
var
a,b: TEnum;
begin
a:=tenum(5);
b:=a;
end;

2)

{$r+}
type
tr = bitpacked record
f: TEnum;
end;
var
a: tenum;
r: tr;
begin
a:=tenum(5);
t.f:=a;
end;

3)

(does this trigger a range check error in Delphi?)

{$r+}
var
arr: array[tenum] of byte;;
a: tenum;
begin
a:=tenum(5);
arr[a]:=1;
end;

(and then the same with tenum replaced by tsubenum)

Should these silently truncate the values, trigger range errors (i.e.,
any conversion/assignment from an enum type to itself should insert a
range check, rather than only when converting between different types),
or just copy the data (which means that in case 2, enums cannot actually
be bitpacked)?

Defining different behaviour depending on the expression is the road to
madness. After all, in all of these cases, the expression that may give
rise to the range error/truncation is identical: a "conversion" from a
tenum value to the tenum type itsef.
Post by Martok
Getting back to the terms Ondrej introduced yesterday, I think that "normal"
enums may or may not be High-Level enumerations, but enums with explicit
assigment can *only* be Low-Level enumerations. Can we safely distinguish them
in the compiler?
Yes.
Post by Martok
Does it even make sense to add that complexity?
I'm not sure.
Post by Martok
"""incrementing or decrementing past the boundary of a subrange simply converts
the value to the base type."""
So we can also leave the min..max range and transparently drop to the parent
type. This raises in $R+, _but is valid otherwise_. (* This is the exact same
text as in the TP5 langref *)
I guess it's a bit like how with {$Q+} you get overflow errors, and with
{$Q-} you have guaranteed 2's-complement logic (at least on a CPU that
uses 2's complement). On the other hand, it makes subranges completely
useless, unless their declared range results in the compiler allocating
an exact multiple of one memory storage unit (bytes, in our case).

Well, unless of course consider having base types that are not a
multiple of 8 bits (I don't see any definition of what can constitute a
base type on the Delphi page you linked). Then you would also have to
add overflow checking for non-multiple-of-byte-sized types. And in this
case, you would still need to support out-of-range values up to whatever
fits in the number of bits reserved for said base type, but at least it
would make bitpacking possible. OTOH, in terms of safety or simplicity
of implementation, little or nothing would be gained.
Post by Martok
My initial proposed trivial solution was to keep this undefined (maybe document
the difference to BP), and simply change codegen to be undefined-safe normally
and only undefined-unsafe in -O4. I am, however, no longer so sure if that is
really a good solution.
Undefined is never safe. Undefined is something at the semantic level,
which pervades the entire language and compiler. Optimisations merely
perform additional transformations that also honour those semantics. You
cannot say "this is undefined, but safe at -O0". Even if only because it
may no longer be safe even at -O0 the next year, after adding support
for GIMPL or LLVM output. Even more likely it may happen because in
general, check conditions (such as "does this need a range check") and
implementations (such as jump tables, that are basically just loading
array entries and then jumping to the address) tend to get factored out
over time.

Either is something is defined and fully supported, or it's not.
Something in between cannot exist in any sane programming language nor
in a sane implementation of a compiler implementing said language. Of
course, many programmers like to believe that is in fact possible and
write their programs based on how one (often single version of) compiler
compiles it. And then you indeed get rants on LKML about clang and LLVM
and newer gcc versions, while their code was broken all along.
Post by Martok
There has to be a reason why everybody else chose Low-Level enums, except that
it is far simpler to implement, right?
I don't know, but I still don't understand why on Earth you would want
them in a strongly typed language.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepa
Martok
2017-07-18 11:53:41 UTC
Permalink
I'll start at the end.
Post by Jonas Maebe
Post by Martok
There has to be a reason why everybody else chose Low-Level enums, except that
it is far simpler to implement, right?
I don't know, but I still don't understand why on Earth you would want
them in a strongly typed language.
I see them as extremely useful as enums with explicit assignments, and indeed
not so much with "automatic" enums.
Explicit Enums then become a way to have assignment-safe (not typesafe in the
true sense of the word) constants - the compiler can tell me that trying to
assign a SignatureAlgorithm name to a CipherAlgorithm field is wrong, for
example. {$SCOPEDENUMS} can also improve readability - I have a Canon EDSDK
import that only really works because of that constellation.
Or take the gazillion of uint constants for OpenGL, grouping them in a type by
what they do can show when I confuse similar-sounding constants.

I'll reply to your code examples in a second message because I think you just
uncovered another bug.
Post by Jonas Maebe
Well, unless of course consider having base types that are not a> multiple of 8 bits (I don't see any definition of what can constitute a> base
type on the Delphi page you linked). Then you would also have to> add overflow
checking for non-multiple-of-byte-sized types. And in this> case, you would
still need to support out-of-range values up to whatever> fits in the number of
bits reserved for said base type, but at least it> would make bitpacking
possible. OTOH, in terms of safety or simplicity> of implementation, little or
nothing would be gained.
I don't think they have bitpacking?
Base type for enums is Byte, Word, Cardinal depending on $Z.
"When you use numeric or character constants to define a subrange, the base type
is the smallest integer or character type that contains the specified range."
and I would assume the enum type for subranges over enums.

And yes, subranges are a bit useless outside of their declarative meaning and in
set construction.
From the manuals it looks like they tried changing that in between TP5 and TP7
(probably because of the Pascal standardisation?) but went back to the old
relaxed solution for Delphi.

Proposal:
Everything stays as it is for 'automatic' enums.
'Explicit' enums internally have the full range of their base type
(get_min/max_value, getrange return the base type's values), except for two
functions: the Low() and High() intrinsics continue to return the first/last
declared element.

I believe this is entirely in the previously undefined part of the language. It
makes no change to automatic enums and aligns explicit enums with Delphi. Having
the range functions like that means we don't have to touch any optimizer code at
all - it gets the correct bounds. Same for range checking code.
Subranges continue to be strict (as the "convex hull" of the enumeration's
declared values, but also covering unnamed values in between), so nothing
changes for arrays or bitpacked records.

How about that?

Martok


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists
Martok
2017-07-18 13:26:20 UTC
Permalink
Post by Jonas Maebe
Post by Martok
And then there's bitpacked records...
Indeed, and range checking. I mean, if you have the above declarations,
Delphi generates no range checking code for enum assignments at all, only for
mutation. All your examples work without error, just maybe not in a way you'd want.
I think this is a bug, not any assumption - they do check the range in things
like conditional expressions, just not in proper rangechecks.
Post by Jonas Maebe
1)
{$r+}
var
a,b: TEnum;
begin
a:=tenum(5);
b:=a;
end;
Aliasing should not count as an operation, so no rangecheck code inserted at all.
Post by Jonas Maebe
2)
{$r+}
type
tr = bitpacked record
f: TEnum;
end;
var
a: tenum;
r: tr;
begin
a:=tenum(5);
t.f:=a;
end;
By transition same as 1, but we should get an integer overflow error because
bitsizeof(r.f) < bitsizeof(a).
In my latest proposal it would matter if TEnum has explicit values or not. If it
has, f would be large enough; if not, there should be an overflow check because
two variables of the same type may suddenly not have the same size.
Post by Jonas Maebe
(which means that in case 2, enums cannot actually be bitpacked)?
I think only automatic enums can be. IMO:

{$Z1}
bitpacked record
f: (a,b,c,d);
end; => OK, bitsize 2

bitpacked record
f: (a=6,b,c,d); => either error (like use as array index in mode FPC)
end; => or OK, bitsize 8 (because of base type)

TEnum = (a=6,b,c,d); => on its own, bitsize 8
bitpacked record
f: a..d;
end; => OK, bitsize 4 (values up to d=9)

That would be consistent with
a) how sets of enums are packed
b) how a subrange over Integer is bitpacked smaller than Integer.

That is, however, again an occasion where the only real use of the subrange is
in its declarative use.
Post by Jonas Maebe
3)
(does this trigger a range check error in Delphi?)
{$r+}
var
arr: array[tenum] of byte;;
a: tenum;
begin
a:=tenum(5);
arr[a]:=1;
end;
Again, no RC code in Delphi, which would be valid only for auto enums.

Bonus:

4)

{$r+}
type
tenum = (a,b=3,c);
ta = array[tenum] of byte;
var
arr: ta;
v: tenum;
begin
for v := low(arr) to high(arr) do begin
arr[v]:=1;
end;
end.

Still no RC code, but now we can be wrong on both sides. Also the loop happily
iterates over the invalid values 1 and 2.

5)

{$r+}
var
a: TEnum;
b: TSubEnum;
begin
a:=tenum(5);
b:=a;
end;

That should include RC code, and be an error. Indeed that is what FPC currently
generates, Delphi gets it wrong again. Part of the reason why I think this is a
bug on their part.


Martok

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepa
Martok
2017-07-16 20:15:39 UTC
Permalink
Post by Jonas Maebe
Does that mean that you would consider the same transformation of a
case-statement when using a subrange type as correct? And that putting a
value outside the range of a subrange into such a variable as a
programmer error? (as opposed to doing the same with a non-subrange enum
type?)
Hold on, there already is a test for that particular question in the language
itself!

-> Can the type be used as an array index?

---------------------------
{$mode objfpc}
type
TExplEnum = (a=1, b=3, c=5, d=7);
TEnArr = array[TExplEnum] of Byte;
---------------------------
=> Error: enums with assignments cannot be used as array index

Makes sense, after all, what should happen with the gaps? And creating the array
for the entire base type with lots of filler data would potentially be too
memory-consuming.

However:
---------------------------
{$mode objfpc}
type
TExplEnum = (a=1, b=3, c=5, d=7);
TSubEnum = a..d;
TEnArr = array[TSubEnum] of Byte;

begin
WriteLn('SizeOf(TEnArr) = ', SizeOf(TEnArr));
WriteLn('Low(TEnArr) = ', Low(TEnArr), ', ', Ord(Low(TEnArr)));
WriteLn('High(TEnArr) = ', High(TEnArr), ', ', Ord(High(TEnArr)));
end.
---------------------------
SizeOf(TEnArr) = 7
Low(TEnArr) = a, 1
High(TEnArr) = d, 7
---------------------------

That difference was unexpected. At least for me.


In {$mode delphi} (and Delphi), we get the second result for both tests. So
there already is some distinction of enum semantics between modes.



Also:
---------------------------
k:= Pred(c);
---------------------------
Error: succ or pred on enums with assignments not possible

---------------------------
k:= Pred(TSubEnum(c));
---------------------------
Happily compiles.


So, from the compiler's perspective, we cannot rely on the values of enums with
assignments enough to use them as an index (or count them), but we can do so
with subranges - because the subrange in question is effectively 1..7 and
doesn't actually know (or care) about the enum-ness of its host type.


Huh. Fascinating.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/
Florian Klämpfl
2017-07-16 20:39:41 UTC
Permalink
Post by Martok
---------------------------
{$mode objfpc}
type
TExplEnum = (a=1, b=3, c=5, d=7);
TSubEnum = a..d;
TEnArr = array[TSubEnum] of Byte;
begin
WriteLn('SizeOf(TEnArr) = ', SizeOf(TEnArr));
WriteLn('Low(TEnArr) = ', Low(TEnArr), ', ', Ord(Low(TEnArr)));
WriteLn('High(TEnArr) = ', High(TEnArr), ', ', Ord(High(TEnArr)));
end.
---------------------------
SizeOf(TEnArr) = 7
Low(TEnArr) = a, 1
High(TEnArr) = d, 7
---------------------------
That difference was unexpected. At least for me.
Indeed, this is a bug. IMO the declaration of TSubEnum should not be allowed.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org
Florian Klämpfl
2017-07-16 21:11:59 UTC
Permalink
Post by Florian Klämpfl
Post by Martok
---------------------------
{$mode objfpc}
type
TExplEnum = (a=1, b=3, c=5, d=7);
TSubEnum = a..d;
TEnArr = array[TSubEnum] of Byte;
begin
WriteLn('SizeOf(TEnArr) = ', SizeOf(TEnArr));
WriteLn('Low(TEnArr) = ', Low(TEnArr), ', ', Ord(Low(TEnArr)));
WriteLn('High(TEnArr) = ', High(TEnArr), ', ', Ord(High(TEnArr)));
end.
---------------------------
SizeOf(TEnArr) = 7
Low(TEnArr) = a, 1
High(TEnArr) = d, 7
---------------------------
That difference was unexpected. At least for me.
Indeed, this is a bug. IMO the declaration of TSubEnum should not be allowed.
I made a patch and tested it, however, this causes regressions in our tests, so I am not sure if it
should be changed.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Ondrej Pokorny
2017-07-16 21:17:56 UTC
Permalink
Post by Florian Klämpfl
Post by Florian Klämpfl
Post by Martok
---------------------------
{$mode objfpc}
type
TExplEnum = (a=1, b=3, c=5, d=7);
TSubEnum = a..d;
TEnArr = array[TSubEnum] of Byte;
begin
WriteLn('SizeOf(TEnArr) = ', SizeOf(TEnArr));
WriteLn('Low(TEnArr) = ', Low(TEnArr), ', ', Ord(Low(TEnArr)));
WriteLn('High(TEnArr) = ', High(TEnArr), ', ', Ord(High(TEnArr)));
end.
---------------------------
SizeOf(TEnArr) = 7
Low(TEnArr) = a, 1
High(TEnArr) = d, 7
---------------------------
That difference was unexpected. At least for me.
Indeed, this is a bug. IMO the declaration of TSubEnum should not be allowed.
I made a patch and tested it, however, this causes regressions in our tests, so I am not sure if it
should be changed.
It must not be changed. Delphi documentation is clear about this case:
http://docwiki.embarcadero.com/RADStudio/XE5/en/Simple_Types#Enumerated_Types_with_Explicitly_Assigned_Ordinality

/type Size = (Small = 5, Medium = 10, Large = Small + Medium);//
//defines a type called Size whose possible values include Small,
Medium, and Large, where Ord(Small) returns 5, Ord(Medium) returns 10,
and Ord(Large) returns 15.//
//*An enumerated type is, in effect, a subrange whose lowest and highest
values correspond to the lowest and highest ordinalities of the
constants in the declaration. **In the previous example, the Size type
has 11 possible values whose ordinalities range from 5 to 15. (Hence the
type array[Size] of Char represents an array of 11 characters.) Only
three of these values have names, but the others are accessible through
typecasts and through routines such as Pred, Succ, Inc, and Dec.*/

Ondrej
Florian Klämpfl
2017-07-15 19:31:36 UTC
Permalink
Post by Martok
Several different ways of writing the (apparent) tautology "is EnumVar in
Low(EnumType)..High(EnumType)" all handle out-of-range-values (expressly, not as
a side effect of something else).
... only because nobody implemented such an optimization yet.
Post by Martok
Which is especially noteworthy because with
strict enums, we might as well drop the elseblock entirely and warn "unreachable
code" in these tests.
Yes, FPC does this for subrange types, see e.g. https://bugs.freepascal.org/view.php?id=16006

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bi
Marco van de Voort
2017-07-14 08:04:51 UTC
Permalink
Post by Martok
There is a fundamental difference in the type system between a somewhat sensible
(if unexpected) assumption in FPC and a more practical documented definition in
every other Pascal compiler. An assumption that even FPC follows only in this
one single spot.
This is unexpected and breaks unrelated code. That's the problem.
Other pascal's don't have sparse enums ?
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc
Martok
2017-07-14 09:50:08 UTC
Permalink
Post by Marco van de Voort
Post by Martok
There is a fundamental difference in the type system between a somewhat sensible
(if unexpected) assumption in FPC and a more practical documented definition in
every other Pascal compiler. An assumption that even FPC follows only in this
one single spot.
This is unexpected and breaks unrelated code. That's the problem.
Other pascal's don't have sparse enums ?
Wait, what do sparse enums have to do with any of that?

But: yes, they do.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi
Martok
2017-10-04 08:03:52 UTC
Permalink
Hi all,

another few months, and something funny happened this morning: a tweet shows up
in my timeline which links to this article on efficient codegen for dispatch
with switch statements:
<http://www.cipht.net/2017/10/03/are-jump-tables-always-fastest.html>
Very interesting!

Why I'm resurrecting this thread is the reference to Arthur Sale's 1981 paper
"The Implementation of Case Statements in Pascal". He compares linear lists,
jumptables, binary search and masksearch (on B6700 machines). The bit on
jumptables (journal-page 933) contains this part:
"""The jump-table itself consists of half-word (3 bytes) unconditional branches,
and must be half-word synchronized. The range-check and indexed branch add up to
23 bytes of instructions on the assumption that the range limit values are
fitted into 8-bit literals, and there is a 0-2 byte padding required to achieve
jump-table synchronization. *The range check is never omitted as the
consequences of a wild branch which lands outside the jump-table are potentially
disastrous.* If the range is r, the space requirements are therefore [...]"""

"potentially disastrous" probably didn't mean security as much as "my room-sized
mainframe crashes", but the point stands...
--
Martok

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.o
Loading...