Discussion:
FPU (x87) code debugging.
(too old to reply)
R.Wieser
2021-08-06 17:11:21 UTC
Permalink
Moderator, Frank : I'm not sure if questions about the x87 FPU are permitted
here. If not than please just discard. If they are than please remove this
line. :-)

Hello all,

I've just been writing some basic code to parse a simple float, and realized
that I had no idea how to check if the x87 FPU was empty after I was done -
as a simple measure to check if my code cleaned up correctly.

I've been looking at using the ST bits in the FPU status word, but had to
find that they (unexpectedly) didn't end at zero after I done my thing :

minimal example:

fld1 ;Load
fld1

fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit

fstp st(0) ;Discard
fstp st(0)

At this point all the ST bits are set, indicating a minus one, not zero.

My questions at this point are:

1) Have I done anything wrong in the above ? I don't think so, but "you
never know" ....

2) How do I, for debugging purposes, check the FPU stack ?

Regards,
Rudy Wieser
Frank Kotler
2021-08-06 17:57:16 UTC
Permalink
Post by R.Wieser
Moderator, Frank : I'm not sure if questions about the x87 FPU are permitted
here. If not than please just discard. If they are than please remove this
line. :-)
Hi Rudy,
Consider the line removed.
I think x87 is on topic. If necessary, I so rule it. :)
I don't know the answer, though...
Best.
Frank
Post by R.Wieser
Hello all,
I've just been writing some basic code to parse a simple float, and realized
that I had no idea how to check if the x87 FPU was empty after I was done -
as a simple measure to check if my code cleaned up correctly.
I've been looking at using the ST bits in the FPU status word, but had to
fld1 ;Load
fld1
fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit
fstp st(0) ;Discard
fstp st(0)
At this point all the ST bits are set, indicating a minus one, not zero.
1) Have I done anything wrong in the above ? I don't think so, but "you
never know" ....
2) How do I, for debugging purposes, check the FPU stack ?
Regards,
Rudy Wieser
R.Wieser
2021-08-06 18:55:10 UTC
Permalink
Frank,
Post by Frank Kotler
Post by R.Wieser
Moderator, Frank : I'm not sure if questions about the x87 FPU are permitted
here. If not than please just discard. If they are than please remove this
line. :-)
Hi Rudy,
Consider the line removed.
To be honest, I had forgotten all about you whitelisting (pardon me if that
isn't PC) people and assumed you would see the message before it would go
into the newsgroup. But hey, now I know I'm on your whitelist too. :-)
Post by Frank Kotler
I think x87 is on topic. If necessary, I so rule it. :)
I was't quite sure, as most all here is 16 bit assembly. And thats is from
a time when x87 FPUs were add-on chips. But thanks.
Post by Frank Kotler
I don't know the answer, though...
No problem. Hopefully someone else here has an idea.

Regards,
Rudy Wieser
DJ Delorie
2021-08-07 01:14:03 UTC
Permalink
Post by R.Wieser
2) How do I, for debugging purposes, check the FPU stack ?
If your debugger doesn't support it, you can at least use FSAVE/FRESTOR
to fill in a chunk of data which you can then inspect.
R.Wieser
2021-08-07 07:51:16 UTC
Permalink
DJ,
Post by DJ Delorie
If your debugger doesn't support it
No debugger here (never liked them).
Post by DJ Delorie
you can at least use FSAVE/FRESTOR to fill in a chunk
of data which you can then inspect.
Thanks. That one does give quite a bit of information.

It does have a drawback though: it re-initializes the FPU stack, meaning it
cannot be used while in the middle of a calculation. Any idea to some
non-destructive probing ?

Regards,
Rudy Wieser
wolfgang kern
2021-08-07 08:40:30 UTC
Permalink
Post by R.Wieser
DJ,
Post by DJ Delorie
If your debugger doesn't support it
No debugger here (never liked them).
Post by DJ Delorie
you can at least use FSAVE/FRESTOR to fill in a chunk
of data which you can then inspect.
Thanks. That one does give quite a bit of information.
It does have a drawback though: it re-initializes the FPU stack, meaning it
cannot be used while in the middle of a calculation. Any idea to some
non-destructive probing ?
FXSAVE
__
wolfgang
wolfgang kern
2021-08-07 08:34:10 UTC
Permalink
Post by R.Wieser
2) How do I, for debugging purposes, check the FPU stack ?
not every debug tool supports FPU. I had to write my own debugger
anyway and it uses FXSAVE to show registers and all status bits.

but how did you check 1) FSTCW ? FXAM/r ? FSTENV ? FSTSW AX ?
too many consecutive fstp will cause stack errors.

The FNSTCW instruction does not check for possible floating-point
exceptions before copying the image of the x87 status register.

FCLEX or FXSAVE followed by FINIT work fine for me to clean up.
and FFREE/r is my way to empty a specific register.

I actually hate this stupid stack-up/dn design, an overall ST(n)
would work just fine with much lesser doubtful quirks.
meanwhile we got SSE/AVX and AMD may remove FPU from chip soon.
__
wolfgang
R.Wieser
2021-08-07 10:11:43 UTC
Permalink
Wolfgang,
Post by wolfgang kern
but how did you check 1)
I read the Status Word, using FNSTSW. From there I isolated the ST bits.

Thanks for mentioning FXAM. Something I already thought of being handy to
have, but didn't now the name of. :-)
Post by wolfgang kern
FCLEX or FXSAVE followed by FINIT work fine for me to clean up.
and FFREE/r is my way to empty a specific register.
I already found (and used, just before the code I posted) FNINIT. But that
just drops all "left over" variables and error flags. Not something I want
to finish a calculation with ...

As for FFREE ? I'm not sure I understand its worth - other than to perhaps
delete the bottom-of-stack variable (and even than), as in all other cases
it would create a "hole" on the stack, which I than still would have to
recon with. :-|
Post by wolfgang kern
I actually hate this stupid stack-up/dn design, an overall ST(n)
would work just fine with much lesser doubtful quirks.
:-) Agreed. But as I have to work with what the 'puter offers me I have
no other choice than to deal with it.

[in regard to FSAVE]
Post by wolfgang kern
Post by R.Wieser
It does have a drawback though: it re-initializes the FPU stack, meaning
it cannot be used while in the middle of a calculation. Any idea to some
non-destructive probing ?
FXSAVE
Thanks again.


Blimy! I just realized (did some "thats quaint, what happens if I do
{this}" probing) that the "ST(x)" argument is relative to the "Stack top"
(status word, ST bits). In hindsight that makes sense, but wasn't expected.
It does make the "Stack Top" value useless for a quick "is it empty" test
though.

Regards,
Rudy Wieser
Robert
2021-08-07 14:19:22 UTC
Permalink
Post by R.Wieser
Moderator, Frank : I'm not sure if questions about the x87
FPU are permitted here. If not than please just discard.
If they are than please remove this line. :-)
Hello all,
I've just been writing some basic code to parse a simple
float, and realized that I had no idea how to check if the
x87 FPU was empty after I was done - as a simple measure
to check if my code cleaned up correctly.
You will need FSAVE/FRSTOR (and varients) if you use
the x87. Your first FLD will clobber the stack top,
which might be OK only if it is empty.
Post by R.Wieser
I've been looking at using the ST bits in the FPU status word, but had to
fld1 ;Load
fld1
fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit
fstp st(0) ;Discard
fstp st(0)
At this point all the ST bits are set, indicating a minus one, not zero.
As another poster has said, I don't think the x87 automagically
sets value flags (as x86 does( and needs FXAM. FSTSW=FF sounds
like an empty x87.
Post by R.Wieser
1) Have I done anything wrong in the above ? I don't think
so, but "you never know" ....
2) How do I, for debugging purposes, check the FPU stack ?
Dump and examine in main memory. Like the Hewlett-Packard
Reverse Polish Notation calculators it was modelled on,
the x87 is meant for crunching together, not picking apart.

-- Robert
R.Wieser
2021-08-07 15:51:41 UTC
Permalink
Robert,
Post by Robert
You will need FSAVE/FRSTOR (and varients)
Wolfgang gave that suggestion too. Alas, the F(N)SAVE resets the FPU stack,
and for some reason I can't get the FXSAVE to work (my assembler shows its
age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....."
sequence crashes the program).
Post by Robert
Your first FLD will clobber the stack top,
I don't get that - why only the first one, and why would it clobber (the
value at) the stack top ?
Post by Robert
As another poster has said, I don't think the x87 automagically
sets value flags
I don't quite get this either. Value flags ? I'm reading the "Status
Word" and in it look at the ST bits (at 11-13).

Remark : I later found out/realized that the "Stack Top" is just the
starting offset for the ST(x) arguments. IOW : whats in it isn't really
relevant.
Post by Robert
Dump and examine in main memory.
:-) The problem was that I had no idea that I could or how to do that .

Ofcourse it didn't help that I got confused by (and by it focussed on) the
"Stack Top" value. :-\

Regards,
Rudy Wieser
Robert
2021-08-08 03:22:58 UTC
Permalink
Post by R.Wieser
Robert,
Post by Robert
You will need FSAVE/FRSTOR (and varients)
Wolfgang gave that suggestion too. Alas, the F(N)SAVE resets
the FPU stack, and for some reason I can't get the FXSAVE to work
(my assembler shows its age by not knowing the opcode, and trying
to use a "db 0Fh,0AEH, ....." sequence crashes the program).
It might have some safeguards against executing data :)
Post by R.Wieser
Post by Robert
Your first FLD will clobber the stack top,
I don't get that - why only the first one, and why would
it clobber (the value at) the stack top ?
The stack is eight FP registers, any load pushes the one on
the top into the bit bucket. Actually, I believe the registers
are a circular file, and the load overwrites and decrements TOS.
Post by R.Wieser
Post by Robert
As another poster has said, I don't think the x87 automagically
sets value flags
I don't quite get this either. Value flags ? I'm reading the
"Status Word" and in it look at the ST bits (at 11-13).
Aren't those three bits (0-7) the Top-of-Stack pointer?
People sometimes compare the FPSW with the x86 flags register.
It is not.
Post by R.Wieser
Remark : I later found out/realized that the "Stack Top"
whats in it isn't really relevant.
Exactly.
Post by R.Wieser
Post by Robert
Dump and examine in main memory.
:-) The problem was that I had no idea that I could or how to do that .
Well, debugging always requires more space. x86 assumes
sufficient stack space (or switches to priviliged memory).

34 years ago I wrote an extention to MS-DOS DEBUG.COM to
examine the x87. Converting binaryFP to decimal FP was hard.
Post by R.Wieser
Ofcourse it didn't help that I got confused by (and by it focussed on)
the "Stack Top" value. :-\
Well, quite forgivable. The x87 is focussed on the stack.

-- Robert
R.Wieser
2021-08-08 08:11:47 UTC
Permalink
Robert,
Post by Robert
It might have some safeguards against executing data :)
I've used the "trick" before, so I don't think so. Currently I'm torn
between the posibilities that the processor I'm using might not be having
that command, that I'm simply bungling up or that there is some kind of
memory alignment involved (the latter one would not be the first time I've
run into it).

Is there any possibility you could take a look at and post what code gets
generated for an "FXSAVE {register pointer}" ?
Post by Robert
Post by R.Wieser
I don't get that - why only the first one, and why would
it clobber (the value at) the stack top ?
The stack is eight FP registers, any load pushes the one
on the top into the bit bucket.
True. But such a push would only clobber anything if the (circular) stack
is completely full.
Post by Robert
Actually, I believe the registers are a circular file,
It has to be, as my example code works : after the second FLD1 the TOS is 6.
But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.
Post by Robert
and the load overwrites and decrements TOS.
The info to, for instance, FLD mentions decrementing first, than store
(which is why I didn't understand your "clobbering" remark).
Post by Robert
Aren't those three bits (0-7) the Top-of-Stack pointer?
Yep. I was assuming that that value would (implicitily) tell me how many
values where placed on the stack. Turns out it doesn't. :-\
Post by Robert
People sometimes compare the FPSW with the x86 flags register.
It is not.
Similar perhaps (both contain status flags), but (ofcourse) not the same.
Post by Robert
34 years ago I wrote an extention to MS-DOS DEBUG.COM
to examine the x87.
I'm not sure what you mean with an 'extension' (wasn't aware that Debug
supported such a thing), but years ago I wrote something for it (using
memory patching) so it could deal with a few more opcodes.
Post by Robert
Converting binaryFP to decimal FP was hard.
Thats something I still have to take a look at. Just not at this moment.
:-)

Regards,
Rudy Wieser
Robert Prins
2021-08-08 11:26:25 UTC
Permalink
Post by R.Wieser
Robert,
Post by Robert
It might have some safeguards against executing data :)
I've used the "trick" before, so I don't think so. Currently I'm torn
between the posibilities that the processor I'm using might not be having
that command, that I'm simply bungling up or that there is some kind of
memory alignment involved (the latter one would not be the first time I've
run into it).
Is there any possibility you could take a look at and post what code gets
generated for an "FXSAVE {register pointer}" ?
Post by Robert
Post by R.Wieser
I don't get that - why only the first one, and why would
it clobber (the value at) the stack top ?
The stack is eight FP registers, any load pushes the one
on the top into the bit bucket.
True. But such a push would only clobber anything if the (circular) stack
is completely full.
Post by Robert
Actually, I believe the registers are a circular file,
It has to be, as my example code works : after the second FLD1 the TOS is 6.
But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.
Post by Robert
and the load overwrites and decrements TOS.
The info to, for instance, FLD mentions decrementing first, than store
(which is why I didn't understand your "clobbering" remark).
Post by Robert
Aren't those three bits (0-7) the Top-of-Stack pointer?
Yep. I was assuming that that value would (implicitily) tell me how many
values where placed on the stack. Turns out it doesn't. :-\
Post by Robert
People sometimes compare the FPSW with the x86 flags register.
It is not.
Similar perhaps (both contain status flags), but (ofcourse) not the same.
Post by Robert
34 years ago I wrote an extention to MS-DOS DEBUG.COM
to examine the x87.
I'm not sure what you mean with an 'extension' (wasn't aware that Debug
supported such a thing), but years ago I wrote something for it (using
memory patching) so it could deal with a few more opcodes.
Post by Robert
Converting binaryFP to decimal FP was hard.
Thats something I still have to take a look at. Just not at this moment.
You still haven't told us what OS (DOS, Windoze, Linux) or CPU (32/64 bit)
you're running this code on....

David Lindauer's GRDB (DOS) can show the contents of FPU registers, and as you
are/were a Pascal user, so can, I think Delphi. Virtual Pascal can definitely do
it, I use the (sadly) wrapping code below:

{************** Copyright (C) Robert AH Prins 2018-2018 ****************
* *
* This program is free software; you can redistribute it and/or modify *
* it under the terms of the GNU General Public License as published by *
* the Free Software Foundation; either version 3, or (at your option) *
* any later version. *
* *
* This program is distributed in the hope that it will be useful, *
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
* GNU General Public License for more details. *
* *
* You should have received a copy of the GNU General Public License *
* along with this program; if not, write to the Free Software *
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110, USA *
************************************************************************
+------------+---------------------------------------------------------+
| Date | Major changes |
+------------+---------------------------------------------------------+
| | |
+------------+---------------------------------------------------------+
| 2018-09-30 | Add x_int3 to selectively enable debug code |
+------------+---------------------------------------------------------+
| 2018-08-31 | Initial version |
+------------+---------------------------------------------------------+
************************************************************************
* DEBUG.PAS *
* *
* This unit contains some code that enables viewing of extended (XMM & *
* YMM) registers in various formats. *
***********************************************************************}
unit debug;

{============================} interface {=============================}
const x_int3: boolean = false;

type
r_fpu = record { 16}
st : extended;
zz : array [0..5] of byte;
end;

r_mmx = record { 16}
case integer of
1: (_by: array [0..7] of byte;
z1 : array [0..7] of byte);
2: (_in: array [0..3] of shortint;
z2 : array [0..7] of byte);
3: (_lo: array [0..1] of longint;
z3 : array [0..7] of byte);
4: (_si: array [0..1] of single;
z4 : array [0..7] of byte);
5: (_do: array [0..0] of double;
z5 : array [0..7] of byte);
6: (_ch: array [0..7] of char;
z0 : array [0..7] of byte);
end;

r_xmm = record { 16}
case integer of
1: (_by: array [0..15] of byte);
2: (_in: array [0.. 7] of shortint);
3: (_lo: array [0.. 3] of longint);
4: (_si: array [0.. 3] of single);
5: (_do: array [0.. 1] of double);
6: (_ch: array [0..15] of char);
end;

xsave_hdr = array [0..63] of byte; { 64}

fpu = array [0..7] of r_fpu; { 128}
mmx = array [0..7] of r_mmx; { 128}
xmm = array [0..7] of r_xmm; { 128}

xsptr = ^a_xs;
a_xs = record
case integer of
1: (legacy : array [0..159] of char; { 160} // raw
legacy data
xmm_32 : xmm; { 128} //
XMM0-7 (low part of YMM0-7)
xmm_64 : xmm; { 128} //
XMM8-15 (low part of YMM8-15) (AMD64)
xsave_hdr: xsave_hdr; { 64} //
Storage bitmap for additional data
ymm_32 : xmm; { 128} //
YMM0-7 (high part, low in XMM0-XMM7)
ymm_64 : xmm); { 128} //
YMM8-15 (high part, low in XMM8-XMM15) (AMD64)

2: (fcw : smallword; { 2} // x87
FPU control word
fsw : smallword; { 2} // x87
status word
ftw : byte; { 1} // x87
res_1 : byte; { 1}
fop : smallword; { 2} // x87
last opcode
fip : longint; { 4} // x87 EIP
fcs : smallword; { 2} // x87 CS:
res_1_x64: smallword; { 2} // +
previous: RIP (AMD64)
fdp : longint; { 4} // x87
data pointer
fds : smallword; { 2} // x87 DS:
res_2_x64: smallword; { 2} // +
previous: DIP (AMD64)
mxcsr : longint; { 4} // SSE
control word
mxcsr_msk: longint; { 4}

case integer of
3: (fpu: fpu); { 128} // x87
FPU registers
4: (mmx: mmx)); { 128} // x86
MMX registers

3: (raw : array [0..1023] of byte); { 1024} //
just raw data
end;

procedure xsave;

{==========================} implementation {==========================}

{***********************************************************************
* XSAVE: *
* *
* Save the entire processor state for debugging purposes *
***********************************************************************}
procedure xsave; assembler; {&uses none} {&frame+}
var xs: array [0..2047] of char;
var xp: xsptr;

asm
//a-in xsave
cmp x_int3, true
jne @99

pushad

//------------------------------------------------------------------
// clear out save area
//------------------------------------------------------------------
lea edi, xs
xor eax, eax
mov ecx, type xs / 4
rep stosd

//------------------------------------------------------------------
// save area must be aligned on 64-byte boundary
//------------------------------------------------------------------
lea edi, xs
add edi, 63
and edi, -64
mov xp, edi

//------------------------------------------------------------------
// save everything that can be saved
//------------------------------------------------------------------
or eax, -1
or edx, -1
{ xsave [edi] } db $0f,$ae,$27

//------------------------------------------------------------------
// display data in "Watches" window
// - xp^ : all
// - xp^.fpu : all FPU registers as extended
// - xmm_32[0]._lo: contents of XMM0 as 4 longints
// - etc...
//------------------------------------------------------------------
int 3

popad

@99:
//a-out
end; {xsave}

end.

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/indez.html
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
R.Wieser
2021-08-08 10:47:11 UTC
Permalink
Robert,
Post by Robert Prins
You still haven't told us what OS (DOS, Windoze, Linux) or CPU (32/64 bit)
you're running this code on....
My apologies, I did not think that it would matter (still don't, but ...).

The OS is Windows, XP pro sp3, 32 bit. The used environment is Borlands
Tasm v5.2 (Assembler).
Post by Robert Prins
David Lindauer's GRDB (DOS) can show the contents of FPU registers
The idea is that I would be able to write such FPU debugging code myself.
Somehow I like it that way. :-)
Post by Robert Prins
// save area must be aligned on 64-byte boundary
...
Post by Robert Prins
{ xsave [edi] } db $0f,$ae,$27
Both where what I was looking for. Thanks.

Alas, I still can't get it to work :

lea edi,[@@Foo] ;size is 2000h. Plenty of space.
add edi,003Fh ;[1]
and edi,not 003Fh
or eax,-1 ;Not mentioned in my docs, but ...
or edx,-1
db 0Fh,0AEh,27h ;xsave [edi]

It still "crashes" ("{program.exe}has encountered a problem and needs to
close. We are sorry for the inconvenience.")

[1] My "The IA-32 Intel Architecture Software Developer's Manual, Volume 2"
mentions an alignment of 16.

Any ideas ?

Regards,
Rudy Wieser
Robert
2021-08-08 22:50:02 UTC
Permalink
Post by R.Wieser
Robert,
Post by Robert
It might have some safeguards against executing data :)
I've used the "trick" before, so I don't think so. Currently I'm
torn between the posibilities that the processor I'm using might
not be having that command, that I'm simply bungling up or that
there is some kind of memory alignment involved (the latter one
would not be the first time I've run into it).
Well, please make sure the pointer is correct (trash easily
gets caught in the upper bits in mixed-mode) and your pgm owns
the memory it points at. Otherwise, segfault.
Post by R.Wieser
Post by Robert
Actually, I believe the registers are a circular file,
It has to be, as my example code works : after the second FLD1 the TOS is 6.
But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.
Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.
Post by R.Wieser
Post by Robert
34 years ago I wrote an extention to MS-DOS DEBUG.COM
to examine the x87.
I'm not sure what you mean with an 'extension' (wasn't aware that
Debug supported such a thing), but years ago I wrote something for it
(using memory patching) so it could deal with a few more opcodes.
Very similar. I added code and patched the command jump table
to enter it when commanded.

-- Robert
R.Wieser
2021-08-09 07:27:16 UTC
Permalink
Robert,
Post by Robert
Well, please make sure the pointer is correct
:-) And how do you propose that should be done ? It sounds like a great
idea, but ...
Post by Robert
(trash easily gets caught in the upper bits in mixed-mode)
Somewhere along the line I forgot to mention that I was programming in
32-bit mode (under Win XP). So, no mixed mode and no trash in the upper
bits.
Post by Robert
Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.
Well ... It /can/ be achieved that way, but only under certain conditions
(related to origin and size). :-)

The problem has been located though : I simply used the wrong R/M value
while hand-encoding the FXSAVE command (likely mixing up the 16 bit table
with the 32 bit one). IOW, I was providing the target addres in a certain
register while the command expected it in another register/form.

Regards,
Rudy Wieser
Robert
2021-08-09 13:08:37 UTC
Permalink
Post by R.Wieser
Robert,
Post by Robert
Well, please make sure the pointer is correct
:-) And how do you propose that should be done ?
It sounds like a great idea, but ...
Walk before you run, when in trouble, drop back. Before trying
a potentially troublesome instruction like FXSAVE, use MOV.
Even hand-assemble from hex if those facilities are in doubt:

MOV EAX, "pointer" ; to see if you can read loc
MOV "pointer", EAX ; to see if you can write
Post by R.Wieser
Post by Robert
(trash easily gets caught in the upper bits in mixed-mode)
Somewhere along the line I forgot to mention that I was programming in 32-bit
mode (under Win XP). So, no mixed mode and no trash in the upper bits.
I don't think XP does 64, but the CPU might. The upper-upper could
get trash. ISTR needing to set something to get IN/OUT to work.
Post by R.Wieser
Post by Robert
Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.
Well ... It /can/ be achieved that way, but only under
certain conditions (related to origin and size). :-)
Zero origin, power-of-two size. Check on both.
Ever wonder why there are so many buffers this way?
Post by R.Wieser
The problem has been located though : I simply used the wrong R/M value
while hand-encoding the FXSAVE command (likely mixing up the 16 bit table
with the 32 bit one). IOW, I was providing the target addres in a certain
register while the command expected it in another register/form.
Debugging with MOV test (hand-assembled) could have caught.

-- Robert
R.Wieser
2021-08-09 13:50:18 UTC
Permalink
Robert
Post by Robert
Post by R.Wieser
Post by Robert
Well, please make sure the pointer is correct
:-) And how do you propose that should be done ?
It sounds like a great idea, but ...
...
Post by Robert
use MOV.
How would that change anything ? If the target for an FXSAVE is wrong
enough that it causes an exception, how /wouldn't/ that be in the same way
wrong for a MOV ? (lets forget about alignment for a moment)

It would even be making the problem larger, as you would than need to pick a
REG value too - and wonder if it perhaps is having a negative influence on
the result.

FWI, I tried several R/M values, none of which wanted to work. Bad luck I
guess.

In retrospect I should perhaps have tried loading all the common registers
with the same value and tried all R/M values until something worked. On
success it would be a case of determining which register is the source, and
than look back at the instruction set to find a match - and from it figure
out what the/my mistake was.
Post by Robert
Zero origin, power-of-two size. Check on both.
Ever wonder why there are so many buffers this way?
No, never. Really ... <whistle>
Post by Robert
Debugging with MOV test (hand-assembled) could have caught.
I doubt it. See above.

Regards,
Rudy Wieser
Robert Redelmeier
2021-08-09 14:56:52 UTC
Permalink
Post by Robert
Robert
Post by Robert
Post by R.Wieser
Post by Robert
Well, please make sure the pointer is correct
:-) And how do you propose that should be done ?
It sounds like a great idea, but ...
use MOV.
How would that change anything ? If the target for
an FXSAVE is wrong enough that it causes an exception,
how /wouldn't/ that be in the same way wrong for a MOV ?
(lets forget about alignment for a moment)
It is a purer memory test. I thought there was question
of whether FXSAVE was available or supported on your CPU.
This checks opcode encoding too.
Post by Robert
It would even be making the problem larger, as you would
than need to pick a REG value too - and wonder if it perhaps
is having a negative influence on the result.
All GP registers should be available at all times.
Post by Robert
FWI, I tried several R/M values, none of which wanted
to work. Bad luck I guess.
Encoding should not be a guessing game.
The odds are bad, <1% .
Post by Robert
In retrospect I should perhaps have tried loading all the
common registers with the same value and tried all R/M
values until something worked. On success it would be
a case of determining which register is the source, and
than look back at the instruction set to find a match -
and from it figure out what the/my mistake was.
x86 has quirky indirect addressing modes that
are unlikely to yield to trial-and-error.

-- Robert
R.Wieser
2021-08-09 16:11:26 UTC
Permalink
Robert,
Post by Robert Redelmeier
It is a purer memory test.
In what way ? And mind you, I already adressed that.
Post by Robert Redelmeier
I thought there was question of whether FXSAVE was available
or supported on your CPU.
As I could not get a working FXSAVE encoding I started to doubt.
Post by Robert Redelmeier
This checks opcode encoding too.
No need for that, as those two bytes came from an opcode list. The only
unknown part was the adressing of the target memory.
Post by Robert Redelmeier
All GP registers should be available at all times.
Agreed. But it is an extra factor, and as such interference.
Post by Robert Redelmeier
Encoding should not be a guessing game.
What makes you think I was ? I tried a few different R/M encodings
(while providing different registers), and none of them wanted to work.
Hence my (above) described doubt to if the command was available on my
'puter/processor. (read: I was quite certain I did it "by the book")

But when you /know/ something ought to work and you cannot make it so than a
pragmatic approach will be called for. Which includes throwing everything
and the kitchen sink at it to see if /something/ will work. And from that
try to reason back why it does and where you went wrong with the first
attempts.
Post by Robert Redelmeier
x86 has quirky indirect addressing modes that
are unlikely to yield to trial-and-error.
True. But I would not be looking for those. Just a simple one that
/does/ function. From that foot-in-the-door the rest often follows.

And that is effectivily what happened when Wolfgang supplied me with a
working encoding for FXSAVE [EDI] : while trying to match the 0x07 to the
mod,reg,r/m tables I had used I realized I had been using the wrong one. It
was as simple as that.

Regards,
Rudy Wieser
Robert Prins
2021-08-10 15:54:47 UTC
Permalink
Post by R.Wieser
And that is effectivily what happened when Wolfgang supplied me with a
working encoding for FXSAVE [EDI] : while trying to match the 0x07 to the
mod,reg,r/m tables I had used I realized I had been using the wrong one. It
was as simple as that.
Use

<https://defuse.ca/online-x86-assembler.htm>

for all your "db" needs. I use it "all the time" to get P5+ opcodes for Virtual
Pascal in-line assembler, I've become a huge fan of using AVX instructions, and
miraculously, most of the data structures I was using in 1985 (TP3), then
16-bit, now 32-bit, are almost perfectly suited for XMM and YMM code, go figure!

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/indez.html
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
R.Wieser
2021-08-10 14:46:26 UTC
Permalink
Robert,
Post by Robert Prins
Use
<https://defuse.ca/online-x86-assembler.htm>
for all your "db" needs.
Thank you very much. It will certainly come in handy. :-)

... and it doesn't even need JS to "do its thing". <thumbs up>

Regards,
Rudy Wieser
Kerr-Mudd, John
2021-08-10 19:53:41 UTC
Permalink
On Tue, 10 Aug 2021 16:46:26 +0200
Post by R.Wieser
Robert,
Post by Robert Prins
Use
<https://defuse.ca/online-x86-assembler.htm>
for all your "db" needs.
Thank you very much. It will certainly come in handy. :-)
... and it doesn't even need JS to "do its thing". <thumbs up>
I tried mov ax,bx and got
6689D8

I guess x86 means 32bit nowadays!
--
Bah, and indeed Humbug.
Anton Ertl
2021-08-11 07:59:34 UTC
Permalink
Post by Kerr-Mudd, John
I guess x86 means 32bit nowadays!
That's the problem with "x86": People use it to mean any of several
different ISAs. So better avoid that term, and use:

8086 (rarely called IA-16) when you mean that instruction set.
IA-32 when you mean that instruction set (first implementation: 80386)
AMD64 when you mean that instruction set (first implementation: AMD K8
(Opteron, Athlon 64))

And then there are extensions, like the additional 80186 and 80286
instructions (plus the 80286 offers protected mode), or SSE, SSE2,
AVX, ...

Now what does that mean for the name of this newsgroup.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
***@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
wolfgang kern
2021-08-11 08:45:44 UTC
Permalink
Post by Anton Ertl
Post by Kerr-Mudd, John
I guess x86 means 32bit nowadays!
That's the problem with "x86": People use it to mean any of several
8086 (rarely called IA-16) when you mean that instruction set.
IA-32 when you mean that instruction set (first implementation: 80386)
AMD64 when you mean that instruction set (first implementation: AMD K8
(Opteron, Athlon 64))
And then there are extensions, like the additional 80186 and 80286
instructions (plus the 80286 offers protected mode), or SSE, SSE2,
AVX, ...
Now what does that mean for the name of this newsgroup.
I wont recommend to split our CLAX into several CPU-related groups.
all Intel/AMD 16 bit instruction sets are different for CPU families.
And almost all regular readers of this group are aware of this anyway.
__
wolfgang

wolfgang kern
2021-08-11 08:34:11 UTC
Permalink
Post by Kerr-Mudd, John
Post by Robert Prins
<https://defuse.ca/online-x86-assembler.htm>
I tried mov ax,bx and got
6689D8
I guess x86 means 32bit nowadays!
:) of course!
16 bit code will soon just belong to history.

I'm happy to have my own 16/32/64bit disassembler although it already
needs many updates now, but it saves me from internet access.

89 D8 is the STORE variant which should only be used for memory write.

8B c3 would be the correct LOAD opcode. I fight for this since decades
but no one ever listened, so Intel and AMD will never get rid of this
doubles and can't make space for 64 other instructions with 89 mod 3,
like the added valid opcodes for the former illegal 8F08 and 8F10.
__
wolfgang
wolfgang kern
2021-08-08 10:03:39 UTC
Permalink
On 07.08.2021 17:51, R.Wieser wrote:
...
Post by R.Wieser
and for some reason I can't get the FXSAVE to work (my assembler shows its
age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....."
sequence crashes the program).
on older CPUs 0F AE xx will raise exception 6 [illegal opcode] if:

1) bit 5 of xx is 1 (xx 20..3F, 60..7F, A0..BF)
newer CPU may show a few valid instructions (see sandpile.org)

2) mod=3 aka register operand (C0..FF) [memory only!]

3) may raise EXC_6 if not supported
0F AE 90..97 98..9f mean STMXCSR LDMXCSR [support specific]

so I'd recommend either
0F AE 06 00 xx FXSAVE [xx00h] (needs 512 byte DS: buffer !)
or shorter
0F AE 00 FXSAVE [bx+si] (ditto)
or HLL styled :)
0F AE 46 00 FXSAVE [bp+0] (needs 512 byte on SS: stack)
__
wolfgang
R.Wieser
2021-08-08 11:35:46 UTC
Permalink
Wolfgang,
Post by wolfgang kern
so I'd recommend either
...
Post by wolfgang kern
or shorter
0F AE 00 FXSAVE [bx+si] (ditto)
For testing purposes I tend to go with the most basic one first, so I took
that one.
Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.

Alas, same problem : crash.

Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did
not make a difference.

I'm starting to lean towards the possibility that the command is refused
(does not exist). Is there any way to check it ?

Regards,
Rudy Wieser
wolfgang kern
2021-08-08 12:16:43 UTC
Permalink
Post by R.Wieser
Post by wolfgang kern
so I'd recommend either
...
Post by wolfgang kern
or shorter
0F AE 00 FXSAVE [bx+si] (ditto)
For testing purposes I tend to go with the most basic one first, so I took
that one.
Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.
Alas, same problem : crash.
Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did
not make a difference.
I'm starting to lean towards the possibility that the command is refused
(does not exist). Is there any way to check it ?
look up CPUID, one of the returned bits tell if present or not.
__
wolfgang
wolfgang kern
2021-08-08 12:57:01 UTC
Permalink
Post by R.Wieser
Post by wolfgang kern
0F AE 00 FXSAVE [bx+si] (ditto)
For testing purposes I tend to go with the most basic one first, so I took
that one.
Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.
Alas, same problem : crash.
Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did
not make a difference.
within 32 bit:
0F AE 00 is FXSAVE [eax] uses DS:
__
wolfgang
wolfgang kern
2021-08-08 11:41:35 UTC
Permalink
...
Post by R.Wieser
and for some reason I can't get the FXSAVE to work (my assembler shows its
age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....."
sequence crashes the program).
1) bit 5 of xx is 1  (xx 20..3F, 60..7F, A0..BF)
  newer CPU may show a few valid instructions (see sandpile.org)
2) mod=3 aka register operand (C0..FF) [memory only!]
3) may raise EXC_6 if not supported
   0F AE 90..97 98..9f  mean STMXCSR LDMXCSR [support specific]
so I'd recommend either
   0F AE 06 00 xx  FXSAVE [xx00h]  (needs 512 byte DS: buffer !)
or shorter
   0F AE 00        FXSAVE [bx+si]  (ditto)
or HLL styled :)
   0F AE 46 00     FXSAVE [bp+0]   (needs 512 byte on SS: stack)
you seem to work with 32 bit:

0F AE 07 FXSAVE [edi]

you used 27, so I were confused and had you look at my AMD docs,
it says: FXSAVE mem512env 0F AE /0 this Zero means bits 3..5
and I also checked on sandpile.org.
0F AE /4 means XSAVE (it's for CPU status and not for the FPU)
__
wolfgang
R.Wieser
2021-08-08 14:00:47 UTC
Permalink
Wolfgang,
I am. Didn't think it would matter much.
Post by wolfgang kern
0F AE 07 FXSAVE [edi]
I just tried that one, and it worked ! (got 288 bytes of data though, not
512) As a result I'm now thoroughly confused in regard to the mod, reg, r/m
encoding. I tried different ones, but only got crashes.
Post by wolfgang kern
you used 27, so I were confused and had you look at my AMD docs,
That value was suggested by Robert (in his code). And as I didn't get
anywhere ...


Oh blimy - I don't know how I did it, but I just noticed that I somehow
mixed up the 16 and 32-bit mod/reg/rm encodings. With the MOD and REG both
being zero the by R/M targetted registers are rather different between them.
:-|

Bottom line: I made a stupid mistake, created non-working code and got
myself confused as a result. And as I presumptiously forgot to mention the
basics of what I was busy with (32-bit coding) I did really help you guys
find the cause of it. My apologies for that.

Regards,
Rudy Wieser
R.Wieser
2021-08-08 14:30:38 UTC
Permalink
I did really help you guys find the cause of it. My apologies for that.
Ehrms ... "I did *not* really help" ofcourse.

Regards,
Rudy Wieser
wolfgang kern
2021-08-08 14:35:02 UTC
Permalink
Post by R.Wieser
I am. Didn't think it would matter much.
Post by wolfgang kern
0F AE 07 FXSAVE [edi]
I just tried that one, and it worked ! (got 288 bytes of data though, not
512) As a result I'm now thoroughly confused in regard to the mod, reg, r/m
encoding. I tried different ones, but only got crashes.
IIRC we got 288 bytes with FSAVE long, 512 bytes may be just the
required buffer size.
Post by R.Wieser
Post by wolfgang kern
you used 27, so I were confused and had you look at my AMD docs,
That value was suggested by Robert (in his code). And as I didn't get
anywhere ...
Oh blimy - I don't know how I did it, but I just noticed that I somehow
mixed up the 16 and 32-bit mod/reg/rm encodings. With the MOD and REG both
being zero the by R/M targetted registers are rather different between them.
:-|
Bottom line: I made a stupid mistake, created non-working code and got
myself confused as a result. And as I presumptiously forgot to mention the
basics of what I was busy with (32-bit coding) I did really help you guys
find the cause of it. My apologies for that.
I was once there as well :) experience can't be bought!
just fine that we could help, no need for apology.
__
wolfgang
R.Wieser
2021-08-08 16:55:24 UTC
Permalink
Wolfgang,
IIRC we got 288 bytes with FSAVE long, 512 bytes may be just the required
buffer size.
Those 512 bytes do (currently) not seem to be /required/. I initialized the
buffer using a specific byte, and by it could see that nothing from 288 and
up was touched (the "reserved" areas below it however where).

Perhaps that 288-and-up "reserved" area is ment for future generations of
the x87 FPU.
I was once there as well :) experience can't be bought!
I can only hope that I remember it for quite a while.
just fine that we could help,
And thanks for that.
no need for apology.
:-) In that case you may regard it as an explanation of what the problem
actually was. I know that when I try to help someone I often get curious
to it.

Regards,
Rudy Wieser
Loading...