set-optimizer as an API for per-word optimizer

It seems, "set-optimizer" as a basis for such an API is suboptimal,
since you have to describe the same semantics *twice*, and you have a
chance to do it incorrectly.

Yes.

Post by Ruvim
Actually, if we have a definition that compiles some behavior, the
definition that performs this behavior can be created automatically.

Yes, but ... [see below].

Post by Ruvim
: compile-foo ( -- ) ... ;
that appends behavior "foo" to the current definition,
then a word "foo" can be defined as
: foo [ compile-foo ] ;

Actually it's

: compile-foo ( xt -- ) ... ;
: foo recursive [ ' foo compile-foo ] ;

If COMPILE-FOO just drops the xt, you can just pass 0 to COMPILE-FOO.

Post by Ruvim
Then, why do we need to define both "foo" and "compile-foo" by hands?
Having one of them, another can be created automatically.

The usual usage of SET-COMPILER is in defining words, e.g.

: constant1 ( n "name" -- )
create ,
['] @ set-does>
[: >body @ ]] literal [[ ;] set-optimizer ;

Here you have the advantage that the constant needs only one cell in
addition to the header. Yes, you have the disadvantage that the
SET-DOES> and SET-OPTIMIZER actions might disagree, leading to
incorrect behaviour. An additional aspect here is that this
definition assumes that the value of the constant is not changed.

Could we avoid the redundancy and the potential disagreement? You
suggest creating a colon definition for "name". How could this work?
We have to store N somewhere. What I can come up with is:

: lit, postpone literal ;
: constant2 ( n "name" -- )

Post by Ruvim
r :noname r> ]] drop literal lit, ; [[ >r

: 0 r@ execute postpone ; r> set-optimizer ;

The definition

5 constant1 five1

takes 6 cells (on a 64-bit machine) in the dictionary, while

5 constant2 five2

takes 16 cells in the dictionary plus 146 Bytes of native code with
the debugging engine on AMD64.

Moreover, I had several bugs in CONSTANT2 until I got it right, but
that could get better with more practice. But will it get better than
the alternative? The code is larger, so that's far from clear.

In any case, it seems to me that the size advantage alone makes the
CONSTANT1 approach preferable. Yes, you describe the same thing
twice, and you may get it wrong in one description while getting it
right in the other, so you have test both implementations separately
(e.g., interpret the word once, and include it in a colon definition,
and use the same tests on it; maybe we could automate that), but such
bugs are rare.

Post by Ruvim
A better API for per-word optimization should require the user to define
only the compiler for a word, and the word itself will be created
automatically.
[: postpone over postpone over ;] "2dup" define-by-compiler
compiler: 2dup ]] over over [[ ;
: value
create ,
;

The first two are alternatives, the third one addresses a different
need. For the VALUE example, how does the implementation work; I can
imagine how it works for the 2DUP examples.

Post by Ruvim
BTW, I don't see why xt should be passed to a compiler (as it's done in
"set-compiler"). In what cases it's useful?

It's useful for getting the value of the constant in CONSTANT1. It's
also the interface of COMPILE,. SET-OPTIMIZER only defines what
COMPILE, does for the word that SET-OPTIMIZER is applied to. If
COMPILE, instead DROPped the xt and only then called the word that we
pass with SET-OPTIMIZER, that works nicely for the 2DUP example, but
how would DOES-BY-COMPILER produce the ADDR that is passed to the xt?

- anton

--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

Ruvim

2022-11-23 17:26:00 UTC

It seems, "set-optimizer" as a basis for such an API is suboptimal,
since you have to describe the same semantics *twice*, and you have a
chance to do it incorrectly.

Yes.

Post by Ruvim
Actually, if we have a definition that compiles some behavior, the
definition that performs this behavior can be created automatically.

Yes, but ... [see below].

Post by Ruvim
: compile-foo ( -- ) ... ;
that appends behavior "foo" to the current definition,
then a word "foo" can be defined as
: foo [ compile-foo ] ;

Actually it's
: compile-foo ( xt -- ) ... ;
: foo recursive [ ' foo compile-foo ] ;
If COMPILE-FOO just drops the xt, you can just pass 0 to COMPILE-FOO.

Post by Ruvim
Then, why do we need to define both "foo" and "compile-foo" by hands?
Having one of them, another can be created automatically.

The usual usage of SET-COMPILER is in defining words, e.g.
: constant1 ( n "name" -- )
create ,

To me, "lit," looks far more comprehensible than "]] literal [["

Post by Anton Ertl
Here you have the advantage that the constant needs only one cell in
addition to the header. Yes, you have the disadvantage that the
SET-DOES> and SET-OPTIMIZER actions might disagree, leading to
incorrect behaviour. An additional aspect here is that this
definition assumes that the value of the constant is not changed.
Could we avoid the redundancy and the potential disagreement? You
suggest creating a colon definition for "name". How could this work?
: lit, postpone literal ;
: constant2 ( n "name" -- )

Post by Ruvim
r :noname r> ]] drop literal lit, ; [[ >r

The definition
5 constant1 five1
takes 6 cells (on a 64-bit machine) in the dictionary, while
5 constant2 five2
takes 16 cells in the dictionary plus 146 Bytes of native code with
the debugging engine on AMD64.

The code is lager since create-does in Gforth avoids duplication of some
code parts (i.e., it utilizes one instance for many definitions). And
since anonymous definition are too heavy in Gforth. For example,
":noname ;" takes 24 bytes (3 cells) in Gforth, 3 bytes (3/4 cells) in
SwiftForth 3.11.6, and 1 byte (1/4 cells) in SP-Forth/4.

For colon definitions this difference should not be so drastic.

OTOH, if you provide an optimizer that generates longer code instead of
a definition call, it's probably not a problem that the definition
itself takes more space.

Post by Anton Ertl
Moreover, I had several bugs in CONSTANT2 until I got it right, but
that could get better with more practice. But will it get better than
the alternative? The code is larger, so that's far from clear.

Having proper tools, it should not be more difficult.

Post by Anton Ertl
r :noname r> ]] drop literal lit, ; [[ ( xt )

They can be expressed far simpler as following.

In "constant1":
[: >body @ lit, ;]

In "constant2":
['] lit, partial1

Post by Anton Ertl
In any case, it seems to me that the size advantage alone makes the
CONSTANT1 approach preferable. Yes, you describe the same thing
twice, and you may get it wrong in one description while getting it
right in the other, so you have test both implementations separately
(e.g., interpret the word once, and include it in a colon definition,
and use the same tests on it; maybe we could automate that), but such
bugs are rare.

The first two are alternatives, the third one addresses a different
need. For the VALUE example, how does the implementation work; I can
imagine how it works for the 2DUP examples.

Ideally, an implementation for such "does-by-compiler" should be
supported by the corresponding implementations for "create" and "does>".

But for the purpose of PoC we can do it less efficiently. So a Gforth
specific PoC is following.

In Gforth, ":" and ":noname" affect "latestxt" (which is used by
"set-does>" and "set-compiler"), but "[: ... ;]" doesn't affect it.
So I use the latter construct to create intermediate helper definitions.
The intermediate definitions are needed to adapt the interface of
"does-by-compiler" to the interface of "set-does>" and "set-optimizer"
in Gforth.

: begin-quot ( C: -- quotation-sys colon-sys ) ['] [: execute ;
: end-quot ( C: quotation-sys colon-sys -- xt ) postpone ;] ;

: does-by-compiler ( xt.compiler -- ) \ xt.compiler ( addr.body -- )
latestxt >body >r >r ( R: addr.body xt.compiler )
begin-quot
postpone drop \ the passed addr.body is not needed
2r@ execute
end-quot set-does>
begin-quot
postpone drop \ the passed xt is not needed
r> r> lit, compile,
end-quot set-optimizer
;

A usage example:

: val ( x "name" -- )
create , [: lit, postpone @ ;] does-by-compiler
;

123 val x
x . \ prints 123
: foo x . ; foo \ prints 123
456 ' x >body !
x . \ prints 456
see foo \ should show an optimized variant

Post by Ruvim
BTW, I don't see why xt should be passed to a compiler (as it's done in
"set-compiler"). In what cases it's useful?

It's useful for getting the value of the constant in CONSTANT1.

As I can see, what is actually needed in this case is not an xt but a
data field address.

Do we have an example when an xt itself is needed?

Post by Anton Ertl
It's also the interface of COMPILE,. SET-OPTIMIZER only defines
what COMPILE, does for the word that SET-OPTIMIZER is applied to. If
COMPILE, instead DROPped the xt and only then called the word that we
pass with SET-OPTIMIZER, that works nicely for the 2DUP example, but
how would DOES-BY-COMPILER produce the ADDR that is passed to the xt?

From the formal point of view, "does>" in run-time makes partial
application. It partially applies the part "X" in "does> X ;" to the
ADDR, producing a new definition, and replaces the execution semantics
of the most recent definition by the execution semantics of this new
definition.

In the case of "does-by-compiler", this new definition is created by
means of the passed xt.compiler, and then the execution semantics of the
most recent definition is replaced by this new definition.

But it still have to partially apply the xt.compiler to create the full
optimizer. A possible more concise definition for "does-by-compiler":

: does-by-compiler ( xt.compiler -- ) \ xt.compiler ( addr.body -- )
latest-name> >body swap 2>r ( R: addr.body xt.compiler )
begin-quot 2r@ execute end-quot latest-name> replace-behavior
2r> partial1 latest-name> advise-compiler
;

where
partial1 ( x xt1 -- xt2 )
\ xt2 is partially applied xt1 to x
\ This word may use data space.

latest-name> ( -- xt )
\ xt is the execution token of the most recently appended
\ definition in the compilation word list.
\ An ambiguous condition exists if such a definition is absent.

advise-compiler ( xt.compiler xt -- )
\ It makes "compiler," to only perform xt.compiler
\ when it's applied to xt. It may use data space.
\ An ambiguous condition exists if the execution semantics
\ identified by xt.compiler are distinct from appending
\ the execution semantics identified by xt to the current definition.

replace-behavior ( xt.new xt -- )
\ It makes xt to identify the execution semantics identified
\ by xt.new. It may use data space.
\ An ambiguous condition exists if xt is not for a definition
\ that is created by "create".

--
Ruvim

Anton Ertl

2022-11-23 22:07:47 UTC

Post by Anton Ertl
The usual usage of SET-COMPILER is in defining words, e.g.
: constant1 ( n "name" -- )
create ,

To me, "lit," looks far more comprehensible than "]] literal [["

Post by Ruvim
r :noname r> ]] drop literal lit, ; [[ >r

So let's see how the size of such a constant would be in SwiftForth
4.0.0-RC52 (64-bit):

defer thunk ok
here :noname drop 5 postpone literal ; is thunk ok
: five2 [ 0 thunk ] ; ok
here swap - . \ 56 ok

see thunk
44CBA0 402637 ( (DEFER) ) CALL E8925AFBFF

thunk +F
44CBAF 5 # EBX MOV BB05000000
44CBB4 40C27A ( LITERAL ) JMP E9C1F6FBFF ok
see five2
44CBD2 -8 [RBP] RBP LEA 488D6DF8
44CBD6 RBX 0 [RBP] MOV 48895D00
44CBDA 5 # RBX MOV 48BB0500000000000000
44CBE4 RET C3 ok

So despite the heavy definitions of Gforth, the Gforth FIVE1 is
smaller than the SwiftForth FIVE2. How small would a SwiftForth FIVE1
be?

here 5 constant five1 here swap - . \ 38 ok

see five1
44CBA0 402528 ( (CONSTANT) ) CALL E88359FBFF
44CBA5 5 ok

Post by Ruvim
Having proper tools, it should not be more difficult.

Post by Anton Ertl
r :noname r> ]] drop literal lit, ; [[ ( xt )

They can be expressed far simpler as following.
['] lit, partial1

Yes, I can use closures rather than :noname for plugging the constant
in. Closures only consist of the stored data plus two cells of
metadata; and they are also much nicer to write. So we get:

: lit, postpone literal ;
: constant2 ( n "name" -- )
[n:d nip lit, ;] >r
: 0 r@ execute postpone ; r> set-optimizer ;

The whole part after the closure should be the same for every such
defining word (but use the proper xt instead of 0), so yes, this could
be much smaller in source code, something like:

: lit, postpone literal ;
: constant2 ( n "name" -- )
[n:d nip lit, ;] define-by-optimizer ;

Concerning the executable code, the need for a colon definition for
every defined word is still a disadvantage of this approach. For
gforth (the debugging engine) on AMD64 I see 11 cells and 47 Bytes of
native code for five2.

Post by Ruvim
BTW, I don't see why xt should be passed to a compiler (as it's done in
"set-compiler"). In what cases it's useful?

It's useful for getting the value of the constant in CONSTANT1.

As I can see, what is actually needed in this case is not an xt but a
data field address.
Do we have an example when an xt itself is needed?

: general-compile, ( xt -- )
postpone literal postpone execute ;

This is the default for the COMPILE, method. It is used whenever no
more specific COMPILE, implementation is installed with SET-OPTIMIZER.

- anton

Anton Ertl

2022-11-24 10:32:26 UTC

Post by Anton Ertl
The usual usage of SET-COMPILER is in defining words, e.g.
: constant1 ( n "name" -- )
create ,

...

Post by Anton Ertl
: lit, postpone literal ;
: constant2 ( n "name" -- )
[n:d nip lit, ;] >r
The whole part after the closure should be the same for every such
defining word (but use the proper xt instead of 0), so yes, this could
: lit, postpone literal ;
: constant2 ( n "name" -- )
[n:d nip lit, ;] define-by-optimizer ;
Concerning the executable code, the need for a colon definition for
every defined word is still a disadvantage of this approach. For
gforth (the debugging engine) on AMD64 I see 11 cells and 47 Bytes of
native code for five2.

Some more thoughts in this direction: We can define (tested)

: constant3 ( n "name" -- )
create {: n :}
n [{: n :}d drop n ;] set-does>
n [{: n :}d drop ]] n [[ ;] set-optimizer ;

5 constant3 five3
five3 .
: foo five3 ;
see foo
foo .

11 cells for five3 (with no native code), and worked on first try.
That's 5 cells for the CREATEd word, and 3 cells for each closure.

We have three word headers here, one for "name" and two for the
closures. We could change the header of "name" to be of a closure

[{: n :}d n ;]

and do a variant of the set-optimizer closure that uses the passed xt
to get to the data for "name" and use that (essentially reusing the
[{: n :}d part of the other closure. Something like (does not work):

: constant3 ( n "name" -- )
create [{: n :}d n ;]... ]] n [[ ;] set-xt&optimizer ;

As a result, FIVE3 would consume the same 6 cells as FIVE1. Next, how
can we eliminate the redundancy of specifying separately what happens
on EXECUTE and what happens at COMPILE,?

Looking at defining words for words with read-only parameters, the
usage looks quite systematical:

: +field3 ( n1 n2 "name" -- )
over + swap create [{: n :}d n + ;]... ]] n + [[ ;] set-xt&optimizer ;

: fconstant3 ( r "name" -- )
create [{ f: r :}d r ;]... ]] r [[ ;] set-xt&optimizer ;

So one might think that we can have something like

: fconstant3 ( r "name" -- )
create [{ f: r :}d [ "r" gen-xt&optimizer ] ;

and GEN-XT&OPTIMIZER repeats its parameter at the appropriate places,
generates the code for the rest of the double-closure, and calls
SET-XT&OPTIMIZER.

For words with changeable data (e.g., 2VALUE), we could use the same
approach by treating the address as read-only:

: 2value4 ( n1 n2 "name" -- )
here >r align 2, r> create [{ a }:d [ "a @" gen-xt&optimizer ] ;

However, this needs an extra cell for keeping the address, and TO
would have to find the data by going through the address. A more
appropriate way would be to start with

: 2value3 ( n1 n2 "name" -- )
create 2,
[: 2@ ;] set-does>
[: >body ]] literal 2@ [[ ;] set-optimizer ;

I guess all defining words for changeable data can be implemented with
this scheme, so we might have something like GEN-XT&OPT-WRITABLE,
where we could define 2VALUE3 as:

: 2value3 ( n1 n2 "name" -- )
create 2,
[: [ "2@" gen-xt&opt-writable ] ;

And maybe we can avoid the redundant code fragments occuring in
practice with just these two words.

However, we don't have that many potential uses of SET-XT&OPTIMIZER
and GEN-XT&OPTIMIZER and GEN-XT&OPT-WRITABLE in Gforth that I would
expect that implementing such words to ever pay off. There are only
17 occurences of SET-OPTIMIZER in the Gforth image, not all of them
fit the bill (e.g., the use in FORWARD), and bugs stemming from this
redundancy have not been a problem yet.

Having several closures with shared data might be more generally
useful, though.

- anton

Ruvim

2022-11-29 12:12:16 UTC

[...]

Post by Ruvim
BTW, I don't see why xt should be passed to a compiler (as it's done in
"set-compiler"). In what cases it's useful?

It's useful for getting the value of the constant in CONSTANT1.

As I can see, what is actually needed in this case is not an xt but a
data field address.
Do we have an example when an xt itself is needed?

: general-compile, ( xt -- )
postpone literal postpone execute ;
This is the default for the COMPILE, method. It is used whenever no
more specific COMPILE, implementation is installed with SET-OPTIMIZER.

Well, "compile," can pass xt to the default general method.

But do we have an example when an xt itself is useful for the compiler
that is set via "set-optimizer"?

The mentioned optimizer for "pick" (from another thread) actually does
not require an xt argument, since it knows beforehand that it's xt of
"pick" only.

--
Ruvim

Anton Ertl

2022-11-29 22:08:58 UTC

Post by Ruvim
But do we have an example when an xt itself is useful for the compiler
that is set via "set-optimizer"?

: ;abi-code, ['] ;abi-code-exec peephole-compile, , ;
: does, ( xt -- ) does-check ['] does-xt peephole-compile, , ;

: fold-constants {: xt m xt: pop xt: unpop xt: push -- :}
\ compiles xt with constant folding: xt ( m*n -- l*n ).
\ xt-pop pops m items from literal stack to data stack, xt-push
\ pushes l items from data stack to literal stack.
lits# m u>= if
pop xt catch 0= if
push rdrop exit then
unpop then
xt dup >code-address docol: = if
:,
else
peephole-compile,
then ;

: fold2-1 ( xt -- ) 2 ['] 2lits> ['] >2lits ['] >lits fold-constants ;
' fold2-1 folds * and or xor
' fold2-1 folds min max umin umax
' fold2-1 folds nip
' fold2-1 folds rshift lshift arshift rol ror
' fold2-1 folds = > >= < <= u> u>= u< u<=
' fold2-1 folds d0> d0< d0=
' fold2-1 folds /s mods

and similar for FOLD1-1 FOLD 1-2 FOLD2-0 FOLD2-2 FOLD2-3 FOLD3-1
FOLD3-3 FOLD4-1 FOLD4-2. And while FOLD1-0 and FOLD4-4 only have one
client at the moment, this could change, so why make it specific to
that client?

\ optimize +loop (not quite folding)
: replace-(+loop) ( xt1 -- xt2 )
case
['] (+loop) of ['] (/loop)# endof
['] (+loop)-lp+!# of ['] (/loop)#-lp+!# endof
-21 throw
endcase ;

: (+loop)-optimizer ( xt -- )
lits# 1 u>= if
lits> dup 0> if
swap replace-(+loop) peephole-compile, , exit then

Post by Ruvim
lits then

peephole-compile, ;

' (+loop)-optimizer optimizes (+loop)
' (+loop)-optimizer optimizes (+loop)-lp+!#

: opt+- {: xt: op -- :}
lits# 1 = if
0 lits> op ?dup-if
['] lit+ peephole-compile, , then
exit then
action-of op fold2-1 ;
' opt+- folds + -

- anton

none) (albert

2022-11-24 10:08:45 UTC

It seems, "set-optimizer" as a basis for such an API is suboptimal,
since you have to describe the same semantics *twice*, and you have a
chance to do it incorrectly.

Yes.

Post by Ruvim
Actually, if we have a definition that compiles some behavior, the
definition that performs this behavior can be created automatically.

Yes, but ... [see below].

Post by Ruvim
: compile-foo ( -- ) ... ;
that appends behavior "foo" to the current definition,
then a word "foo" can be defined as
: foo [ compile-foo ] ;

Actually it's
: compile-foo ( xt -- ) ... ;
: foo recursive [ ' foo compile-foo ] ;
If COMPILE-FOO just drops the xt, you can just pass 0 to COMPILE-FOO.

Post by Ruvim
Then, why do we need to define both "foo" and "compile-foo" by hands?
Having one of them, another can be created automatically.

The usual usage of SET-COMPILER is in defining words, e.g.
: constant1 ( n "name" -- )
create ,

To me, "lit," looks far more comprehensible than "]] literal [["

Post by Ruvim
r :noname r> ]] drop literal lit, ; [[ >r

{ } takes 5 CELLS (20/40 bytes) in ciforth. Who cares?
I compile an AHEAD in front of and a THEN after, in order to
use it in the middle of a definition. In interpret mode this is
not necessary, but I do it anyway. Who cares?

Post by Ruvim
For colon definitions this difference should not be so drastic.
OTOH, if you provide an optimizer that generates longer code instead of
a definition call, it's probably not a problem that the definition
itself takes more space.

Hear hear. An optimiser works best if there is simple code to
begin with.

<SNIP>

Post by Ruvim
--
Ruvim

--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

none) (albert

2022-11-21 10:26:15 UTC

It seems, "set-optimizer" as a basis for such an API is suboptimal,
since you have to describe the same semantics *twice*, and you have a
chance to do it incorrectly.

SET-OPTIMIZER seems to be a glorified peep-hole optimiser.
A general optimiser that work on a definition, then inline and process
the resulting code is described in
https://home.hccnet.nl/a.w.m.van.der.horst/forthlecture5.html

<SNIP>

Post by Ruvim
--
Ruvim

Anton Ertl

2022-11-21 20:28:48 UTC

Post by none) (albert
SET-OPTIMIZER seems to be a glorified peep-hole optimiser.

What makes you think so?

COMPILE, is not even a peephole optimizer; it just compiles a single
word.

- anton

none) (albert

2022-11-22 15:07:28 UTC

Post by none) (albert
SET-OPTIMIZER seems to be a glorified peep-hole optimiser.

What makes you think so?
COMPILE, is not even a peephole optimizer; it just compiles a single
word.

You almost got me! I thought it has something to do with optimisation.
My fault.

Post by Anton Ertl
- anton

Groetjes Albert

Anton Ertl

2022-11-23 12:02:43 UTC

Post by none) (albert

Post by Anton Ertl
COMPILE, is not even a peephole optimizer; it just compiles a single
word.

You almost got me! I thought it has something to do with optimisation.

What makes you think it does not?

SET-OPTIMIZER must not be used for changing the behaviour of
COMPILE,ing the xt (the meaning of COMPILE, is fixed), so the only
correct use is to change the implementation; the primary use is for
improving the generated code (i.e., optimization). A secondary
potential use is instrumentation, but we have not used it for that
yet.

Let's see what happens is we use the most general COMPILE,
implementation instead of the ones installed with SET-OPTIMIZER:

sieve bubble matrix fib fft numbers on a 4GHz Skylake
0.078 0.109 0.044 0.068 0.025 gforth-fast with SET-OPTIMIZER (default)
0.181 0.219 0.138 0.274 0.091 gforth-fast without SET-OPTIMIZER
0.144 0.213 0.100 0.201 0.069 gforth-itc with SET-OPTIMIZER
0.152 0.237 0.102 0.228 0.071 gforth-itc without SET-OPTIMIZER (default)

The invocations for these four measurements were (same order as above):

gforth-fast onebench.fs
gforth-fast -e ":noname ['] lit peephole-compile, , ['] execute peephole-compile, ; is compile," onebench.fs
gforth-itc -e "' opt-compile, is compile," onebench.fs
gforth-itc onebench.fs

- anton

none) (albert

2022-11-28 14:17:20 UTC

Post by none) (albert

Post by Anton Ertl
COMPILE, is not even a peephole optimizer; it just compiles a single
word.

You almost got me! I thought it has something to do with optimisation.

My Ubuntu installs gforth 0.7.3.
It helps if you mention the results with that version for comparison,
to give an impression of the progress you have made with optimisation.
(And we can see the benefit if the gforth team pushes a newer
version to Debian.)
<SNIP>

Post by Anton Ertl
- anton

Anton Ertl

2022-11-29 08:12:10 UTC

Post by none) (albert

Post by Anton Ertl
Let's see what happens is we use the most general COMPILE,
sieve bubble matrix fib fft numbers on a 4GHz Skylake
0.078 0.109 0.044 0.068 0.025 gforth-fast with SET-OPTIMIZER (default)
0.181 0.219 0.138 0.274 0.091 gforth-fast without SET-OPTIMIZER
0.144 0.213 0.100 0.201 0.069 gforth-itc with SET-OPTIMIZER
0.152 0.237 0.102 0.228 0.071 gforth-itc without SET-OPTIMIZER (default)

My Ubuntu installs gforth 0.7.3.
It helps if you mention the results with that version for comparison,

I don't have Ubuntu on that machine, but for comparison the Debian 11
distribution of gforth-0.7.3 (first line), and current gforth-fast
invoked in the Debian-default way.

sieve bubble matrix fib fft numbers on a 4GHz Skylake
0.104 0.144 0.064 0.146 gforth-fast 0.7.3 from Debian 11
0.098 0.125 0.067 0.121 0.042 gforth-fast --no-dynamic with SET-OPTIMIZER
0.078 0.109 0.044 0.068 0.025 gforth-fast with SET-OPTIMIZER (default)
0.181 0.219 0.138 0.274 0.091 gforth-fast without SET-OPTIMIZER
0.144 0.213 0.100 0.201 0.069 gforth-itc with SET-OPTIMIZER
0.152 0.237 0.102 0.228 0.071 gforth-itc without SET-OPTIMIZER (default)

Post by none) (albert
to give an impression of the progress you have made with optimisation.

The difference between 0.7.3 and current is not primarily in code
generation (there the big step was from 0.5 to 0.6), and the code
generation differences are not just on the COMPILE, level.
Nevertheless, let's look at the difference in fib between the first,
second, and third line:

Debian (no-dynamic) no-dynamic dynamic
0.7.3 current
dup dup dup 1->1
lit lit lit 1->1
<2> #2 #2
< < ?branch < ?branch 1->1
?branch
<140135227933968> <fib+$58> <fib+$58>
drop drop drop 1->0
lit lit lit 0->1
<1> #1 #1
branch branch branch 1->1
<140135227934056> <fib+$A8> <fib+$A8>
dup dup dup 1->1
1- 1- 1- 1->1
call call call 1->1
<fib> fib fib
swap swap swap 1->1
lit lit+ lit+ 1->1
<2> #-2 #-2
-
call call call 1->1
<fib> fib fib
+ + + 1->1
;s ok ;s ;s 1->1

We see here that current has a static superinstruction for < ?BRANCH
(possible in 0.7 and IIRC 0.6, but the superinstruction was not
there).

We also see that "2 -" is compiled in current into lit+ (with the
operand -2); this is achieved using SET-OPTIMIZER.

And we see the static stack caching states in the dynamic output; on
AMD64 it mostly stays in the default state 1 of having one stack item
in a register, but the sequence DROP LIT is optimized with using state
0 (no stack item in a register) in between them; that was already
possible in 0.7, but was and is not possible with --no-dynamic, and
you only see it nicely with the current SEE-CODE.

Post by none) (albert
(And we can see the benefit if the gforth team pushes a newer
version to Debian.)

We don't push, Debian pulls. Given that the main thing missing from
Gforth-1.0 is to update the documentation to the changes, and given
that Debian does not deliver the documentation, they could just pull a
current snapshot.

Anyway, Debian has been maiming Gforth not just by not delivering the
documentation, but also by making --no-dynamic the default, which
disables a number of Gforth optimizations below the COMPILE, level.
To show what you can expect from Debian/Ubuntu, I also present the
numbers for gforth-fast --no-dynamic.

So you can see the difference between the first and second line as
indication of what improvement you can expect from the Debian
installation of Gforth-1.0, and the difference between the second and
third line of what you can expect from making your own installation of
Gforth-1.0 (plus, you get the documentation). Note that the
improvements from dynamic code generation tend to be larger for bigger
programs; for smaller programs the indirect branch predictors of
modern CPUs work very well, for larger programs a little worse.

- anton

none) (albert

2022-11-29 10:32:46 UTC

In article <***@mips.complang.tuwien.ac.at>,
Anton Ertl <***@mips.complang.tuwien.ac.at> wrote:
<SNIP>

Post by Anton Ertl
Anyway, Debian has been maiming Gforth not just by not delivering the
documentation, but also by making --no-dynamic the default, which
disables a number of Gforth optimizations below the COMPILE, level.
To show what you can expect from Debian/Ubuntu, I also present the
numbers for gforth-fast --no-dynamic.

Yeah. The free software lawyers at Debian have decided that
the info docs as supplied for Gforth (and many more free
programs) do not comply with their "freedom" standards.

Post by Anton Ertl
So you can see the difference between the first and second line as
indication of what improvement you can expect from the Debian
installation of Gforth-1.0, and the difference between the second and
third line of what you can expect from making your own installation of
Gforth-1.0 (plus, you get the documentation). Note that the
improvements from dynamic code generation tend to be larger for bigger
programs; for smaller programs the indirect branch predictors of
modern CPUs work very well, for larger programs a little worse.

Thanks, saved for further study.

Post by Anton Ertl
- anton

Bernd Paysan

2022-12-12 23:53:11 UTC

Yeah. The free software lawyers at Debian have decided that the info
docs as supplied for Gforth (and many more free programs) do not comply
with their "freedom" standards.

Actually, they didn't. They passed https://www.debian.org/vote/2006/
vote_001 “GFDL-licensed works without unmodifiable sections are free”.
Gforth's documentation has no invariant section, so it is free.

It's just plain and simple idiocy. Nothing we can fix (we will mention
that the documentation has no invariant section in the next release notes,
though). Well, we do absolutely everything to make a Debian maintainers
life as easy as possible with the current development system. We even
maintain our own Debian distribution.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
net2o id: kQusJzA;7*?t=***@X}1GWr!+0qqp_Cn176t4(dQ*
https://bernd-paysan.de/

none) (albert

2022-12-13 09:45:11 UTC

Post by Bernd Paysan

Yeah. The free software lawyers at Debian have decided that the info
docs as supplied for Gforth (and many more free programs) do not comply
with their "freedom" standards.

Actually, they didn't. They passed https://www.debian.org/vote/2006/
vote_001 âGFDL-licensed works without unmodifiable sections are freeâ.
Gforth's documentation has no invariant section, so it is free.

I didn't know that. I thought they use the inmodifiable sections
as excuse. A poor excuse, because they should put these invaluable
documentations then in the non-free section. ( "non-free" between
scare quotes.)

Post by Bernd Paysan
It's just plain and simple idiocy. Nothing we can fix (we will mention
that the documentation has no invariant section in the next release notes,
though). Well, we do absolutely everything to make a Debian maintainers
life as easy as possible with the current development system. We even
maintain our own Debian distribution.

You are not alone having issues with Debian maintainers.
I've spent a couple of years arriving at a .deb archive for ciforth
that complies with all their rules. No one is willing to sponsor,
(that is looking at it and put it in a distribution.)
Likely candidates who sponsors IMHO crappy Forths didn't bother
to answer.
I generated i86 ciforth in debian format (.deb) and just
distribute them myself.
It is actually much easier to create a .deb
format then abiding by their zillions rules and use their
tools.

Fun fact. There is a rpm distribution of AMD version appearing
spontaneously, without any effort from my part.

It is a pity that there is no official, newer gforth version that is
spread more widely in distributions.

Post by Bernd Paysan
--
Bernd Paysan

Groetjes Albert

minf...@arcor.de

2022-11-21 12:03:37 UTC

It seems, "set-optimizer" as a basis for such an API is suboptimal,
since you have to describe the same semantics *twice*, and you have a
chance to do it incorrectly.

I agree, too much handcrafting required for my taste. What are compilers
good for? _Automatic_ translation from a source to a target language.
In most Forths that would be assembler or machine code.

Unfortunately or luckily (depending on one's POV) the standard Forth
compiler is ultra-dumb (it can even generate correct code from state-
smart definitions). AFAIU the SET-OPTIMIZER scheme is one way to
enhance or even bypass the dumb compiler for better results through
lots of cryptic meta-information. I would expect a better Forth compiler
to parse run-time & compile-time stack diagrams to generate the greater
part of this meta-information automatically.

IIRC gforth uses vmgen to preparate Forth code for gcc and relies on
the many optimization passes built into gcc:
https://gcc.gnu.org/onlinedocs/gccint/Passes.html
What does SET-OPTIMIZER do better than gcc's optimizers?

Anton Ertl

2022-11-21 20:27:06 UTC

Post by ***@arcor.de

Post by Ruvim
It seems, "set-optimizer" as a basis for such an API is suboptimal,
since you have to describe the same semantics *twice*, and you have a
chance to do it incorrectly.

I agree, too much handcrafting required for my taste. What are compilers
good for? _Automatic_ translation from a source to a target language.

Yes, but someone has to write the compiler. And how do these people
plug it into the Forth system? With SET-OPTIMIZER. And since this is
Forth, they don't bury their tools. So you have the option of using
SET-OPTIMIZER. Or you can rely on what gets done automatically when
you don't use SET-OPTIMIZER. The latter option is correct, but may be slower.q

Post by ***@arcor.de
Unfortunately or luckily (depending on one's POV) the standard Forth
compiler is ultra-dumb (it can even generate correct code from state-
smart definitions). AFAIU the SET-OPTIMIZER scheme is one way to
enhance or even bypass the dumb compiler for better results through
lots of cryptic meta-information.

State-smartness does not come into play at the level where
COMPILE,/SET-OPTIMIZER operate; but of course a correct COMPILE,
generates correct code when you pass it the xt of a STATE-smart word,
whether it generates more or less efficient code. Whether the word
that you now have poisoned with STATE-smartness behaves as intended by
you and as expected by others is another story.

Post by ***@arcor.de
I would expect a better Forth compiler
to parse run-time & compile-time stack diagrams to generate the greater
part of this meta-information automatically.

COMPILE,/SET-OPTIMIZER works at a different level and sees only one
word each time COMPILE, is invoked.

But I would not expect a Forth compiler that compiles a colon
definition at a time to benefit from stack diagrams. The words to be
compiled determine the stack effect, while the stack effect comment
may be wrong or the compiler may misundertstand it.

Post by ***@arcor.de
IIRC gforth uses vmgen to preparate Forth code for gcc and relies on
https://gcc.gnu.org/onlinedocs/gccint/Passes.html
What does SET-OPTIMIZER do better than gcc's optimizers?

It's faster. Therefore it is used at the Forth system run-time, while
gcc is only used at Gforth build time.

Martin Maierhofer did a Forth2C compiler in 1995 that used gcc for
its back end, but it's a proof of concept. There has not been much
interest from the Forth community in this work (or work by others that
compiled through C).

@InProceedings{ertl&maierhofer95,
author = {M. Anton Ertl and Martin Maierhofer},
title = {Translating {Forth} to Efficient {C}},
crossref = {euroforth95},
url = {http://www.complang.tuwien.ac.at/papers/ertl%26maierhofer95.ps.gz},
url2 = {http://www.complang.tuwien.ac.at/papers/ertl%26maierhofer95.pdf},
abstract = {An automatic translator can translate Forth into C
code which the current generation of optimizing C
compilers compiles to efficient machine code. I.e.,
the resulting code keeps stack items in registers
and rarely updates the stack pointer. This paper
presents a simple translation method that produces
efficient C code, describes an implementation of the
method and presents results achieved with this
implementation: The translated code is 4.5--7.5
times faster than Gforth (the fastest measured
interpretive system), 1.3--3 times faster than
BigForth 386 (a native code compiler), and smaller
than Gforth's threaded code.}
}

@Proceedings{euroforth95,
title = "EuroForth~'95 Conference Proceedings",
booktitle = "EuroForth~'95 Conference Proceedings",
year = "1995",
key = "EuroForth '95",
address = "Schloss Dagstuhl, Germany",
}

- anton

P Falth

2022-11-21 21:23:11 UTC