Discussion:
Why does adding a 49 prefix to this instruction crash?
(too old to reply)
none) (albert
2020-03-07 10:39:51 UTC
Permalink
I have an 64 bits Forth system, and I can add a program that
executes a single instruction, like so
"
WANT ASSEMBLERi86

CODE PIET
MOVI|X, AX| 2 IL,
NEXT,
END-CODE
"

And execute it like so
PIET OK

This program does nothing. It fills EAX with 2 which is inconsequential
because EAX is a free register. [Only SP BP and SI are used in the
virtual system.]
Now let us prefix the instruction, such that the alternate register set
is used. This should be likewise inconsequential.

CODE PIET1
$49 C, \ That is the way to do that in Forth
MOVI|X, AX| 2 IL,
NEXT,
END-CODE

Now PIET1 leads to a segfault.
I've no clue what could cause this.

I have been working with those prefixes for ages.
My ciasdis has disassembled and reassembled a 64 bit elf program
without problems.
[ This is in the context of an optimiser, I seem to have used
this R1 in optimised programs, that work. ]

Groetjes Albert
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
JJ
2020-03-08 05:23:02 UTC
Permalink
Post by none) (albert
I have an 64 bits Forth system, and I can add a program that
executes a single instruction, like so
"
WANT ASSEMBLERi86
CODE PIET
MOVI|X, AX| 2 IL,
NEXT,
END-CODE
"
And execute it like so
PIET OK
This program does nothing. It fills EAX with 2 which is inconsequential
because EAX is a free register. [Only SP BP and SI are used in the
virtual system.]
Now let us prefix the instruction, such that the alternate register set
is used. This should be likewise inconsequential.
CODE PIET1
$49 C, \ That is the way to do that in Forth
MOVI|X, AX| 2 IL,
NEXT,
END-CODE
Now PIET1 leads to a segfault.
I've no clue what could cause this.
I have been working with those prefixes for ages.
My ciasdis has disassembled and reassembled a 64 bit elf program
without problems.
[ This is in the context of an optimiser, I seem to have used
this R1 in optimised programs, that work. ]
Groetjes Albert
Inserting that 0x49 opcode would change the instruction functionality from:

mov eax, <32-bit immediate value>

To:

mov r8, <64-bit immediate value>

As well as changing the instruction length from 5 bytes, to 10 bytes.

So, the 4 bytes following the `mov eax, <value>` instruction are the upper
32-bits of the `mov r8, <value>`.

If the generated code is like this:

49 db 49h
B8 02 00 00 00 mov eax, 2
C3 ret
90 nop
90 nop
90 nop

It will be interpreted as:

49 B8 02 00 00 00 C3 90 90 90 mov r8, 909090C300000002

Where the following instruction is no longer a `RET` instruction. So, it
won't immediately return to the caller. When executed, the result would be
unpredictable and eventually ends up crashing the program, because we don't
know what data follows that instruction.
none) (albert
2020-03-08 09:53:10 UTC
Permalink
Post by JJ
Post by none) (albert
I have an 64 bits Forth system, and I can add a program that
executes a single instruction, like so
"
WANT ASSEMBLERi86
CODE PIET
MOVI|X, AX| 2 IL,
NEXT,
END-CODE
"
And execute it like so
PIET OK
This program does nothing. It fills EAX with 2 which is inconsequential
because EAX is a free register. [Only SP BP and SI are used in the
virtual system.]
Now let us prefix the instruction, such that the alternate register set
is used. This should be likewise inconsequential.
CODE PIET1
$49 C, \ That is the way to do that in Forth
MOVI|X, AX| 2 IL,
NEXT,
END-CODE
Now PIET1 leads to a segfault.
I've no clue what could cause this.
I have been working with those prefixes for ages.
My ciasdis has disassembled and reassembled a 64 bit elf program
without problems.
[ This is in the context of an optimiser, I seem to have used
this R1 in optimised programs, that work. ]
Groetjes Albert
mov eax, <32-bit immediate value>
mov r8, <64-bit immediate value>
As well as changing the instruction length from 5 bytes, to 10 bytes.
So, the 4 bytes following the `mov eax, <value>` instruction are the upper
32-bits of the `mov r8, <value>`.
49 db 49h
B8 02 00 00 00 mov eax, 2
C3 ret
90 nop
90 nop
90 nop
49 B8 02 00 00 00 C3 90 90 90 mov r8, 909090C300000002
Where the following instruction is no longer a `RET` instruction. So, it
won't immediately return to the caller. When executed, the result would be
unpredictable and eventually ends up crashing the program, because we don't
know what data follows that instruction.
Thank you. This is the correct answer. I did an experiment putting
a lot of nops after the MOVI. Sure enough, I got a long constant with
lots of 90 and no crashes.

One thing to add though: this behaviour is different than the
situation where there is an operation with immediate data.
Then the immediate data is always 32 bit and sign extended.
I missed the difference.

Groetjes Albert
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
R.Wieser
2020-03-08 11:19:40 UTC
Permalink
Albert,
Post by none) (albert
Sure enough, I got a long constant with
lots of 90 and no crashes.
Where "lots" equals just four (the difference between a 32- and a 64-bit
operand). Though you might ofcourse have many more, as those (beyond the
first four) will than be recognised as NOP instructions.

... which you can easily verify by using another value than 0x90. Like
0xCC ( INT 3 - the "breakpoint" instruction). Than only exactly four extra
bytes will /not/ cause your program to behave unexpectedly.

A suggestion though: Put some 0xCC bytes /after/ the RET instruction too.
That way providing to few bytes om immediate data will cause the same effect
(caused by executing an INT 3 /after/ the (intended!) RET as providing to
many bytes will (caused executing an INT 3 /before/ the intended RET).
Post by none) (albert
One thing to add though: this behaviour is different than
the situation where there is an operation with immediate
data.
No, its /exactly/ the same. The modified-by-the-prefix command tries to
load a 64-bit register, and thus expects expects 8 bytes of immediate data,
where the unmodified one tries to load a 32-bit register, and therefore
expects just 4 bytes.

Its the same difference beween

MOVI AL, {immediate data}
MOVI AX, {immediate data}
MOVI EAX,{immediate data}

The size-in-bytes of the immediate data varies with the size of the
targetted register.
Post by none) (albert
Then the immediate data is always 32 bit and sign extended.
Nope :

1) Ot depends on the target register (or memory). See above

2) (immediate or not) data has /no/ sign. There are only a few
instructions which actually do look at the sign bit. Notably the
conditional jumps and the IMUL and IDIV ones. Yes, thats right, not even
the addition, subtraction and compare instructions themselves care about a
value being signed or not.

3) "and sign extended". If-and-when that happens that is your FORTH
compiler at work - most likely recognising a preceeding minus sign in your
sourcecode as an intention to have a negative value stored - and extends the
sign /while compiling/ (read: before the program is executed).

Regards,
Rudy Wieser
wolfgang kern
2020-03-08 11:39:21 UTC
Permalink
On 08.03.2020 12:19, R.Wieser wrote:

...
Post by R.Wieser
3) "and sign extended". If-and-when that happens that is your FORTH
compiler at work - most likely recognising a preceeding minus sign in your
sourcecode as an intention to have a negative value stored - and extends the
sign /while compiling/ (read: before the program is executed).
:)

USE16:
81 00 80 FF add word [bx+si], 0xFF80 ;unsigned
83 00 80 add word [bx+si],-128 ;sign extended byte

both do the same ...
not all compilers default to the shorter variant.
__
wolfgang
R.Wieser
2020-03-08 14:04:42 UTC
Permalink
Wolfgang
Post by wolfgang kern
81 00 80 FF add word [bx+si], 0xFF80 ;unsigned
83 00 80 add word [bx+si],-128 ;sign extended byte
both do the same ...
:-) You got me, there are indeed a few more instructions that actually do
consider signed immediate values

Though did you /have/ to use that particular decimal value ? As a byte its
one of two special values that can't be negated (made negative), which
rather distracted me. :-p

Regards,
Rudy Wieser
wolfgang kern
2020-03-08 15:45:24 UTC
Permalink
Post by R.Wieser
Wolfgang
Post by wolfgang kern
81 00 80 FF add word [bx+si], 0xFF80 ;unsigned
83 00 80 add word [bx+si],-128 ;sign extended byte
both do the same ...
:-) You got me, there are indeed a few more instructions that actually do
consider signed immediate values
Though did you /have/ to use that particular decimal value ? As a byte its
one of two special values that can't be negated (made negative), which
rather distracted me. :-p
NEG AL
NOT AL

use32/64
6A 00 push 00 ;32bit Zero /64 bit
6A FF push -1 ;0xFFFFFFFF /64 bit

and a view more with 83... are quite handy to create short code.
__
wolfgang
R.Wieser
2020-03-08 17:38:07 UTC
Permalink
Wolfgang,
Post by wolfgang kern
and a view more with 83... are quite handy to create short code.
:-) The problem might be that I've not being doing much hand-made code these
days. My assembler just picks the best one for the occasion. I've become
a bit lazy in that regard I'm afraid.

Oh, the days that I tried to re-write stuff so I could shave off a byte
here, and perhaps another one there ... Where have they gone ? Than
again, that was the time where 1.44 MByte floppies where /big/.

Regards,
Rudy Wieser
none) (albert
2020-03-08 15:33:40 UTC
Permalink
Post by R.Wieser
Albert,
Albert,
Post by none) (albert
Sure enough, I got a long constant with
lots of 90 and no crashes.
Where "lots" equals just four (the difference between a 32- and a 64-bit
operand). Though you might ofcourse have many more, as those (beyond the
first four) will than be recognised as NOP instructions.
SNIP>
Its the same difference beween
MOVI AL, {immediate data}
MOVI AX, {immediate data}
MOVI EAX,{immediate data}
The size-in-bytes of the immediate data varies with the size of the
targetted register.
That is not how my assembler rolls. It requires to specify
whether byte or XELL. xell size depends on the environment
(segment type c.q. 32/64 bit mode) and an optional prefix. *and cannot
be specified in the instruction itself*
Specifying that in the operand leads to misunderstandings.

Most people don't know that there are three instructions to
swap the RAX and the RBX registers, because their assembler
allows one. My assembler carefully disciminates between the three,
and generates exactly the object code you specify.
Post by R.Wieser
Post by none) (albert
Then the immediate data is always 32 bit and sign extended.
Confirmed see below
Post by R.Wieser
1) Ot depends on the target register (or memory). See above
2) (immediate or not) data has /no/ sign. There are only a few
instructions which actually do look at the sign bit. Notably the
conditional jumps and the IMUL and IDIV ones. Yes, thats right, not even
the addition, subtraction and compare instructions themselves care about a
value being signed or not.
3) "and sign extended". If-and-when that happens that is your FORTH
compiler at work - most likely recognising a preceeding minus sign in your
sourcecode as an intention to have a negative value stored - and extends the
sign /while compiling/ (read: before the program is executed).
The above is incorrect. With operation I mean something like ADD IMMEDIATE.
Here follows an experiment that proves that. I would never have
made the mistake regarding MOVI if I didn't have prior experience.
Note IL, lays down a long (32 bits) not a quad, as the hexdump proves.
(For those not in the know popping and pushing always implies 64 bit
operations.)

In the following I exercise the add immediate instruction.
"
~/PROJECT/ciasdis/ciasdis$ lina64 -a
AMDX86 ciforth 5.3.0

WANT ASSEMBLERi86 $-PREFIX
....

CODE INC
POP|X, AX|
$48 C,
ADDI, X| R| AX| 12 IL,
PUSH|X, AX|
NEXT,
END-CODE

CODE DEC
POP|X, AX|
$48 C,
ADDI, X| R| AX| -12 IL,
PUSH|X, AX|
NEXT,
END-CODE

WANT DO-DEBUG DUMP

S[ ] OK 'INC >CFA @ 20 DUMP 32 bit
vvvvvvvvv
0000,0000,0041,9DB8: 5848 81C0 0C00 0000 |XH......|
0000,0000,0041,9DC0: 5048 ADFF 2000 0000 0400 0000 0000 0000 |PH.. ...........|

S[ ] OK 'DEC >CFA @ 20 DUMP 32 bit
vvvvvvvvv
0000,0000,0041,A418: 5848 81C0 F4FF FFFF |XH......|
0000,0000,0041,A420: 5048 ADFF 2038 0F40 0000 0000 0004 0000 |PH.. ***@........|

"
Now let us execute those small programs. H. prints a result in hex.
"
S[ ] 0 INC
S[ 12 ] OK H.
0000,0000,0000,000C
S[ ] OK 0 DEC
S[ -12 ] OK H.
FFFF,FFFF,FFFF,FFF4
"
Looks like a 64 bit signed constant to me.

QED

See also
github/albertvanderhorst/ciasdis
Post by R.Wieser
Regards,
Rudy Wieser
Groetjes Albert

P.S. If I do an "INT 3" in my interpreter, it just segfaults.
What I call return is code that instructs the interpreter
to execute the next Forth instruction, not an actual RET instruction.
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
R.Wieser
2020-03-08 17:28:44 UTC
Permalink
Albert,
Post by none) (albert
That is not how my assembler rolls.
You are targetting an X86 processor ? Than it /has/ to.
Post by none) (albert
It requires to specify whether byte or XELL.
/How/ you specify it doesn't matter. /What/ you specify does.
Post by none) (albert
xell size depends on the environment (segment type c.q. 32/64 bit mode)
:-) You call it "xell", we call it "native size". In this newsgroup we
still recognise a 16-bit native size.
Post by none) (albert
*and cannot be specified in the instruction itself*
Look at the three MOVI instructions I posted. Where do I specify the size
? I don't. Its implicitily taken from the target register. The same
happens when the target is some memory.
Post by none) (albert
Most people don't know that there are three instructions
to swap the RAX and the RBX registers, because their
assembler allows one.
Not /allows/, but /generates/ just an arbitrary one of them. When the
effect is the exactly same, why would you want to be able to pick one
yourself ? What /good/ would that do you ?
Post by none) (albert
Post by none) (albert
Then the immediate data is always 32 bit and sign extended.
Confirmed see below
You have confirmed exactly nothing. Besides the problem of calling it
"always 32 bit and sign extended" and in your conclusion "Looks like a 64
bit signed constant" <-no idea where you got that from, but certainly not
from your emitted code dump.

Ask yourself, why would a "MOVI AX, 12" need a four-byte immediate value ?
It doesn't. It could have generated a 05 0C 00 instead.

Also, when I disassemble the emitted bytes in that dump of yours I get
something a bit different:

58 pop eax
48 dec eax
81 C0 0C 00 00 00 add eax, 000000Ch
50 push eax
48 dec eax
AD lodsd
FF 20 jmp dword ptr [eax]

No AX anywhere. But the 32-bit immediate value now matches the size of
the, instead present, EAX register.

And odd, those /two/ "dec eax"-es in there. I would have expected just a
single one, so why two ? And why directly after two stack-related
instructions ?

And where has /your/ "$48 C," gone, and what do you think its good for ?
As you can see, to the processor its just a DEC EAX ...
Post by none) (albert
P.S. If I do an "INT 3" in my interpreter, it just segfaults.
Too bad, but yes, that can happen when you do not have an X86 single-stepper
available.
Post by none) (albert
What I call return is code that instructs the interpreter
to execute the next Forth instruction, not an actual RET instruction.
:-) Yeah, I alread got the feeling that that "lodsd", "jmp [eax]" was to
access a pseudo-(call)stack.

Regards,
Rudy Wieser
none) (albert
2020-03-09 07:39:30 UTC
Permalink
Post by R.Wieser
Albert,
Post by none) (albert
That is not how my assembler rolls.
You are targetting an X86 processor ? Than it /has/ to.
Post by none) (albert
It requires to specify whether byte or XELL.
/How/ you specify it doesn't matter. /What/ you specify does.
Post by none) (albert
xell size depends on the environment (segment type c.q. 32/64 bit mode)
:-) You call it "xell", we call it "native size". In this newsgroup we
still recognise a 16-bit native size.
Post by none) (albert
*and cannot be specified in the instruction itself*
Look at the three MOVI instructions I posted. Where do I specify the size
? I don't. Its implicitily taken from the target register. The same
happens when the target is some memory.
Post by none) (albert
Most people don't know that there are three instructions
to swap the RAX and the RBX registers, because their
assembler allows one.
Not /allows/, but /generates/ just an arbitrary one of them. When the
effect is the exactly same, why would you want to be able to pick one
yourself ? What /good/ would that do you ?
My ciasdis is a reverse engineering assembler. If you're in the habit
of analysing viruses you wouldn't ask this question.
Post by R.Wieser
Post by none) (albert
Post by none) (albert
Then the immediate data is always 32 bit and sign extended.
Confirmed see below
You have confirmed exactly nothing. Besides the problem of calling it
"always 32 bit and sign extended" and in your conclusion "Looks like a 64
bit signed constant" <-no idea where you got that from, but certainly not
from your emitted code dump.
Ask yourself, why would a "MOVI AX, 12" need a four-byte immediate value ?
It doesn't. It could have generated a 05 0C 00 instead.
Why do you insist in using a patronizing assembler that bamboozles you?
As long as it suits you that is fine but not for this matter it is
not.
Post by R.Wieser
Also, when I disassemble the emitted bytes in that dump of yours I get
58 pop eax
48 dec eax
81 C0 0C 00 00 00 add eax, 000000Ch
50 push eax
48 dec eax
AD lodsd
FF 20 jmp dword ptr [eax]
No AX anywhere. But the 32-bit immediate value now matches the size of
the, instead present, EAX register.
Okay. Apparently you insist on considering the code as being 32 bits.
Small wonder that everything looks 32 bit to you.
This is the central genius trick that allowed AMD to introduce 64 bit computing:
Repurpose the 4x instruction space as a prefix.
Now the 4x prefix in 64 bits mode means that the instruction following is
using 64 bit size operands, as well as specifying if any of the extra registers
is used.
This makes the remainder of your comment irrelevant so I stop here.

If anbybody want the full story including disassemblies in 64 bit mode,
go back to the previous message.

Groetjes Albert
Post by R.Wieser
Regards,
Rudy Wieser
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
R.Wieser
2020-03-09 08:31:18 UTC
Permalink
Albert,
Post by none) (albert
Post by R.Wieser
Not /allows/, but /generates/ just an arbitrary one of them. When the
effect is the exactly same, why would you want to be able to pick one
yourself ? What /good/ would that do you ?
My ciasdis is a reverse engineering assembler. If you're in the habit
of analysing viruses you wouldn't ask this question.
And what has being able to muck around with virus code to to with your FORTH
assembler ? Thats right, absolutily nothing.

Though, feel free to come up with an example to why it does. I do not mind
being educated.
Post by none) (albert
Why do you insist in using a patronizing assembler that bamboozles
you? As long as it suits you that is fine but not for this matter it is
not.
:-) The code has to run on an x86 ? Than it doesn't really matter which
assembler you use. Its just syntactic sugar.
Post by none) (albert
Okay. Apparently you insist on considering the code as being 32 bits.
You where free to correct my assumption in that regard, but have not seen
you do that. Besides, the the dump in your previous post mentions it
Post by none) (albert
This is the central genius trick that allowed AMD to introduce 64 bit
computing: Repurpose the 4x instruction space as a prefix.
Ah, thataway. Yes, you got me. I do not have any 64-bit processor 'puters
here, am not aware of the 64-bit menemonics and have no disassemblers for it
either. Although I think I could find out what it actually does from the
32-bit disassembly and a mnemonic lookup list for 64-bit instructions I
don't think I'm going to bother.

Goodbye.

Regards,
Rudy Wieser
Stephen Pelc
2020-03-09 14:10:13 UTC
Permalink
Post by R.Wieser
58 pop eax
48 dec eax
81 C0 0C 00 00 00 add eax, 000000Ch
50 push eax
48 dec eax
AD lodsd
FF 20 jmp dword ptr [eax]
In AMD64, the 4x instructions are *all* REX prefices. The 48
instruction is NOT a DEC reg instruction, it's a REX prefix.

Stephen
--
Stephen Pelc, ***@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612
web: http://www.mpeforth.com - free VFX Forth downloads
none) (albert
2020-03-09 14:57:40 UTC
Permalink
Post by Stephen Pelc
Post by R.Wieser
58 pop eax
48 dec eax
81 C0 0C 00 00 00 add eax, 000000Ch
50 push eax
48 dec eax
AD lodsd
FF 20 jmp dword ptr [eax]
In AMD64, the 4x instructions are *all* REX prefices. The 48
instruction is NOT a DEC reg instruction, it's a REX prefix.
Thanks for helping me out in explaining this to some
asm.x86 denizens.

Please note that I did not publish this disassembly.
This was my disassembly accompagnied with a hex dump,
with the Quadruple prefixes.
CODE PIET3
QN: MOVI|X, AX| 2 IL,
QN: PUSH|X, AX|
NEXT,
END-CODE
Post by Stephen Pelc
Stephen
Groetjes Albert
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Alex McDonald
2020-03-09 15:42:19 UTC
Permalink
Post by Stephen Pelc
Post by R.Wieser
58 pop eax
48 dec eax
81 C0 0C 00 00 00 add eax, 000000Ch
50 push eax
48 dec eax
AD lodsd
FF 20 jmp dword ptr [eax]
In AMD64, the 4x instructions are *all* REX prefices. The 48
instruction is NOT a DEC reg instruction, it's a REX prefix.
Stephen
As can be seen by disassembling in 32 bit mode and 64 bit mode;

mode64

10 constant ten
10. 2constant ten.
' ten constant 'ten

code bar next;
code foo
mov al ten
mov ah ten
mov ax ten
mov eax ten
mov rax ten.
mov al { 'ten }
mov ah { 'ten }
mov ax { 'ten }
mov eax { ' bar }
mov rax { 'ten }
movzx rax word { 'ten }
mov rax { 10. }
next;


see foo
code foo ( ? -- ? )
\ foo is defined in src/test1.fs at line 9
\ code=$41B9D1 len=73 type=129
( $0 ) mov al $A \ B00A
( $2 ) mov ah $A \ B40A
( $4 ) mov ax $A \ 66B80A00
( $8 ) mov eax $A \ B80A000000
( $D ) mov rax $A $0 \ 48B8000000000A000000
( $17 ) mov al byte { rip ' ten } \ 8A05B0FFFFFF
( $1D ) mov ah byte { rip ' ten } \ 8A25AAFFFFFF
( $23 ) mov ax word { rip ' ten } \ 668B05A3FFFFFF
( $2A ) mov eax dword { rip ' bar } \ 8B05CAFFFFFF
( $30 ) mov rax qword { rip ' ten } \ 488B0596FFFFFF
( $37 ) movzx rax word { rip ' ten } \ 480FB7058EFFFFFF
( $3F ) mov rax qword { $A $0 } \ 48A1000000000A000000
( $49 ) ret \ C3 ( end )

There are some interesting encodings that can be seen here, courtesy of
the REX prefix.

1. (at $D) the 64 bit immediate form of MOV RAX has a 64 bit operand.
(at $8) The EAX form is identical in effect if the constant is <= 32
bits (the high order bits in RAX are zeroed), but it encodes a good deal
shorter.

2. relative IP (RIP) addressing (there's an 8 bit form but it needs a
base register)

3. (at $3F) absolute addressing a 64 bit address.

When disassembled in 32 bit mode, yeuch. It's all wrong.

mode32
see foo
code foo ( ? -- ? )
\ foo is defined in src/test1.fs at line 9
\ code=$41B9D1 len=73 type=129
( $0 ) mov al $A \ B00A
( $2 ) mov ah $A \ B40A
( $4 ) mov ax $A \ 66B80A00
( $8 ) mov eax $A \ B80A000000
( $D ) dec eax \ 48
( $E ) mov eax $0 \ B800000000
( $13 ) or al byte { eax } \ 0A00
( $15 ) add byte { eax } al \ 0000
( $17 ) mov al byte { $-50 } \ 8A05B0FFFFFF
( $1D ) mov ah byte { $-56 } \ 8A25AAFFFFFF
( $23 ) mov ax word { $-5D } \ 668B05A3FFFFFF
( $2A ) mov eax dword { $-36 } \ 8B05CAFFFFFF
( $30 ) dec eax \ 48
( $31 ) mov eax dword { $-6A } \ 8B0596FFFFFF
( $37 ) dec eax \ 48
( $38 ) movzx eax word { $-72 } \ 0FB7058EFFFFFF
( $3F ) dec eax \ 48
( $40 ) mov eax dword { $0 } \ A100000000
( $45 ) or al byte { eax } \ 0A00
( $47 ) add byte { eax } al \ 0000
( $49 ) ret \ C3 ( end )

It's still possible to write INC reg in 64 bit mode, and this is also
exactly the same code in 32 bit mode, becase theres a 2 byte form of the
$4x opcodes;

( $0 ) inc al \ FEC0
( $2 ) inc ah \ FEC4
( $4 ) inc ax \ 66FFC0
( $7 ) inc eax \ FFC0
( $9 ) ret \ C3 ( end )
--
Alex
JJ
2020-03-10 05:09:56 UTC
Permalink
Post by Stephen Pelc
In AMD64, the 4x instructions are *all* REX prefices. The 48
instruction is NOT a DEC reg instruction, it's a REX prefix.
Stephen
That's interresting. I didn't realize that there are two sets of opcode for
`INC/DEC <reg>` instructions in x86: `4x` and `FF Cx`. Is there any article
about it? Why there's two set of them? And if one was added later, when was
that?
wolfgang kern
2020-03-10 07:24:33 UTC
Permalink
Post by JJ
Post by Stephen Pelc
In AMD64, the 4x instructions are *all* REX prefices. The 48
instruction is NOT a DEC reg instruction, it's a REX prefix.
That's interresting. I didn't realize that there are two sets of opcode for
`INC/DEC <reg>` instructions in x86: `4x` and `FF Cx`. Is there any article
about it? Why there's two set of them? And if one was added later, when was
that?
There are lots of duplicate opcodes already from the start of it, Caused
by incomplete decoding, but it was cheaper to keep them ...

the classical LOAD vs STORE group:

8A C1 == 88 C8
8A c2 == 88 D0
...
8B FE == 89 F7

and much more.

find the whole instruction set in sandpile.org.
__
wolfgang
none) (albert
2020-03-10 08:22:58 UTC
Permalink
Post by wolfgang kern
Post by JJ
Post by Stephen Pelc
In AMD64, the 4x instructions are *all* REX prefices. The 48
instruction is NOT a DEC reg instruction, it's a REX prefix.
That's interresting. I didn't realize that there are two sets of opcode for
`INC/DEC <reg>` instructions in x86: `4x` and `FF Cx`. Is there any article
about it? Why there's two set of them? And if one was added later, when was
that?
There are lots of duplicate opcodes already from the start of it, Caused
by incomplete decoding, but it was cheaper to keep them ...
The correct reason is that the 8086 had a tiny memory range.
It paid of to shave off one byte of a two-byte instructions for
special cases to accomodate "larger" programs.
This is true for XCHG MOV INC/DEC POP. The shorter instruction
accomodates only a register addressing mode, or only register AX/L/H
See Stephen Morse " the 8086 primer
isbn 0-8104-5165-4
Post by wolfgang kern
wolfgang
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Terje Mathisen
2020-03-10 11:11:38 UTC
Permalink
Post by JJ
Post by Stephen Pelc
In AMD64, the 4x instructions are *all* REX prefices. The 48
instruction is NOT a DEC reg instruction, it's a REX prefix.
Stephen
That's interresting. I didn't realize that there are two sets of opcode for
`INC/DEC <reg>` instructions in x86: `4x` and `FF Cx`. Is there any article
about it? Why there's two set of them? And if one was added later, when was
that?
I suspect they first defined the single-byte opcodes for things like
PUSH/POP reg, INC/DEC/RET/IRET/CLD/STD/CLC etc, many/most of them
inherited from the 8080, then we got the two-byte opcodes for an
accumulator machine, with almost everything going through AL/AX, along
with the Jcc/JMP short branches. Finally they added a much more modern
reg/mem decoding engine which as a side-effect meant that most of the
reg-reg forms with AX as the target would duplicate the short form.

I once stumbled over this when challenging Mike Abrash to improve some
code I had written and was very proud of:

He pointed out that I had written something like 'CMP BX,AX' instead of
the opposite 'CMP AX,BX' and the AX target form would be a byte shorter
and therefore also a few percent faster.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Loading...