Anton Ertl
2022-10-26 20:51:53 UTC
Today I rewrote SEE-CODE (actually it's factor SEE-CODE-RANGE) to
display, e.g.
this instead of this
... ...
$7F7E43202948 r> 1->1 $7F7E43202948 r> 1->1
7F7E42ECADE4: mov [r15],r8 $7F7E43202950 fill
7F7E42ECADE7: sub r15,$08 7F7E42ECADE4: mov [r15],r8
7F7E42ECADEB: mov r8,[rbx] 7F7E42ECADE7: sub r15,$08
7F7E42ECADEE: add r13,$08 7F7E42ECADEB: mov r8,[rbx]
7F7E42ECADF2: add rbx,$08 7F7E42ECADEE: add r13,$08
7F7E42ECADF6: mov rcx,-$08[r13] 7F7E42ECADF2: add rbx,$08
7F7E42ECADFA: jmp ecx 7F7E42ECADF6: mov rcx,-$08[r13]
7F7E42ECADFC: nop 7F7E42ECADFA: jmp ecx
$7F7E43202950 fill 7F7E42ECADFC: nop
$7F7E43202958 ;s 1->1 $7F7E43202958 ;s 1->1
... ...
I had to get some additional information from the decompiler
primitive, so we now have
decompile-prim2 ( a_code -- useqlen ustart uend c_addr u ulen )
instead of
decompile-prim ( a_code -- ustart uend c_addr u ulen )
but the existing factoring also was not very amenable to the change,
so I rewrote SEE-CODE-RANGE from scratch, with the following result:
: see-code-range { addr1 addr2 -- } \ gforth
addr1 0 0 0 ['] noop case { addr nseqlen d: codeblock xt: cr? }
addr addr2 u>= ?of endof
addr @ decompile-prim2 { ulen } ulen 0< ?of
drop 2drop 2drop
cr? addr simple-see-word
addr cell+ nseqlen codeblock ['] cr contof
nseqlen 0= if codeblock discode 0 0 to codeblock ['] noop to cr? then
cr? addr see-word.addr type { nseqlen1 ustart uend } ulen if
ustart 4 spaces 0 .r ." ->" uend .
assert( codeblock nip 0= )
addr @ ulen to codeblock then
addr cell+ nseqlen nseqlen1 max 1- codeblock ['] cr
next-case
codeblock discode ;
As you can see, it uses locals heavily (the word I have written with
the most locals by far). If you want to follow what's going on on the
stack, it uses the following additional words:
simple-see-word ( addr -- )
discode ( c-addr u -- )
see-word.addr ( addr -- )
This code took me a while to write, but apart from the handling of the
CRs, it worked on the first try. The handling of the CRs is
complicated, because DISCODE outputs a CR at the start and at the end
(one might consider this a factoring mistake); we could get rid of
everything to do with CR? if DISCODE did not have this property. I
find that the flexible enhanced CASE control structure is a very
natural way for me to consider the different cases and to keep track
of what has been covered already.
For comparison, below you find the old code; it uses fewer locals, and
is factored into three words, but is it better (apart from the
functionality difference)? IMO no.
\ also uses decompile-prim ( addr1 -- addr2 )
: see-code-word { addr -- len }
addr see-word.addr addr @ decompile-prim1 dup >r -1 = if
2drop 2drop addr cell+ addr @ .word drop
else
type 4 spaces swap r@ if
0 .r ." ->" .
else
2drop then
then
r> ;
: see-code-next-inline { addr1 addr2 -- addr3 }
\ decompile starting at addr1 until an inlined primitive is found,
\ or addr2 is reached; addr3 is addr2 or the next inlined
\ primitive
addr1 begin { addr }
addr addr2 u< while
addr @ dup decompile-prim = while
addr cr simple-see-word
addr cell+
repeat then
addr ;
: see-code-range { addr1 addr2 -- } \ gforth
cr addr1 begin { a }
a see-code-word { restlen }
a cell+ addr2 see-code-next-inline { b }
a @ b addr2 u< while
( a @ ) b @ over - discode
b
repeat
\ now disassemble the remaining a @; we derive the length from
\ it's primitive
restlen discode ;
\ dup decompile-prim dup next-prim swap - discode ;
- anton
display, e.g.
this instead of this
... ...
$7F7E43202948 r> 1->1 $7F7E43202948 r> 1->1
7F7E42ECADE4: mov [r15],r8 $7F7E43202950 fill
7F7E42ECADE7: sub r15,$08 7F7E42ECADE4: mov [r15],r8
7F7E42ECADEB: mov r8,[rbx] 7F7E42ECADE7: sub r15,$08
7F7E42ECADEE: add r13,$08 7F7E42ECADEB: mov r8,[rbx]
7F7E42ECADF2: add rbx,$08 7F7E42ECADEE: add r13,$08
7F7E42ECADF6: mov rcx,-$08[r13] 7F7E42ECADF2: add rbx,$08
7F7E42ECADFA: jmp ecx 7F7E42ECADF6: mov rcx,-$08[r13]
7F7E42ECADFC: nop 7F7E42ECADFA: jmp ecx
$7F7E43202950 fill 7F7E42ECADFC: nop
$7F7E43202958 ;s 1->1 $7F7E43202958 ;s 1->1
... ...
I had to get some additional information from the decompiler
primitive, so we now have
decompile-prim2 ( a_code -- useqlen ustart uend c_addr u ulen )
instead of
decompile-prim ( a_code -- ustart uend c_addr u ulen )
but the existing factoring also was not very amenable to the change,
so I rewrote SEE-CODE-RANGE from scratch, with the following result:
: see-code-range { addr1 addr2 -- } \ gforth
addr1 0 0 0 ['] noop case { addr nseqlen d: codeblock xt: cr? }
addr addr2 u>= ?of endof
addr @ decompile-prim2 { ulen } ulen 0< ?of
drop 2drop 2drop
cr? addr simple-see-word
addr cell+ nseqlen codeblock ['] cr contof
nseqlen 0= if codeblock discode 0 0 to codeblock ['] noop to cr? then
cr? addr see-word.addr type { nseqlen1 ustart uend } ulen if
ustart 4 spaces 0 .r ." ->" uend .
assert( codeblock nip 0= )
addr @ ulen to codeblock then
addr cell+ nseqlen nseqlen1 max 1- codeblock ['] cr
next-case
codeblock discode ;
As you can see, it uses locals heavily (the word I have written with
the most locals by far). If you want to follow what's going on on the
stack, it uses the following additional words:
simple-see-word ( addr -- )
discode ( c-addr u -- )
see-word.addr ( addr -- )
This code took me a while to write, but apart from the handling of the
CRs, it worked on the first try. The handling of the CRs is
complicated, because DISCODE outputs a CR at the start and at the end
(one might consider this a factoring mistake); we could get rid of
everything to do with CR? if DISCODE did not have this property. I
find that the flexible enhanced CASE control structure is a very
natural way for me to consider the different cases and to keep track
of what has been covered already.
For comparison, below you find the old code; it uses fewer locals, and
is factored into three words, but is it better (apart from the
functionality difference)? IMO no.
\ also uses decompile-prim ( addr1 -- addr2 )
: see-code-word { addr -- len }
addr see-word.addr addr @ decompile-prim1 dup >r -1 = if
2drop 2drop addr cell+ addr @ .word drop
else
type 4 spaces swap r@ if
0 .r ." ->" .
else
2drop then
then
r> ;
: see-code-next-inline { addr1 addr2 -- addr3 }
\ decompile starting at addr1 until an inlined primitive is found,
\ or addr2 is reached; addr3 is addr2 or the next inlined
\ primitive
addr1 begin { addr }
addr addr2 u< while
addr @ dup decompile-prim = while
addr cr simple-see-word
addr cell+
repeat then
addr ;
: see-code-range { addr1 addr2 -- } \ gforth
cr addr1 begin { a }
a see-code-word { restlen }
a cell+ addr2 see-code-next-inline { b }
a @ b addr2 u< while
( a @ ) b @ over - discode
b
repeat
\ now disassemble the remaining a @; we derive the length from
\ it's primitive
restlen discode ;
\ dup decompile-prim dup next-prim swap - discode ;
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net