Discussion:
Expand-down v. expand-up stack
(too old to reply)
Rick C. Hodgin
2015-07-12 09:43:15 UTC
Permalink
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?

Why would someone choose one design over another?

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-12 09:55:24 UTC
Permalink
Note: This would be for a new architecture, not one confined to x86's
legacy baggage.

Best regards,
Rick C. Hodgin
n***@gmail.com
2015-07-14 06:26:03 UTC
Permalink
Seperate stack space (Super) Harvard-N (2, 3... I, D, S). what else?
micro/macro code, kernal/super/user states, protection rings.
HLL structs, user/return stacks, operand, dictionary, execution, grapics. HLE threads, CAR/CDR
John Dallman
2015-07-12 10:25:00 UTC
Permalink
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
No major ones, as far as I can see. However, once an architecture has
made a choice, changing it is likely to be more trouble than it's worth.
Post by Rick C. Hodgin
Why would someone choose one design over another?
On operating systems that are limited to a single thread per process, and
have small memories, it can be convenient (or at least simple) to have
the stack growing downwards from the top of data memory and the heap
growing upwards from the bottom. This avoids the need to have separate
areas for stack and heap, at the cost of needing to check regularly they
haven't collided.

However, if you have large amounts of memory, measured in hundreds of MB
or more, and/or multiple threads (and thus stacks) per process, this idea
isn't very useful any more.

If you're designing a new architecture, it is worth considering having
separate stacks for return addresses and for data, so as to make "stack
smashing" security attacks harder.

https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Security

John
Michael S
2015-07-12 10:46:39 UTC
Permalink
Post by John Dallman
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
No major ones, as far as I can see. However, once an architecture has
made a choice, changing it is likely to be more trouble than it's worth.
Post by Rick C. Hodgin
Why would someone choose one design over another?
On operating systems that are limited to a single thread per process, and
have small memories, it can be convenient (or at least simple) to have
the stack growing downwards from the top of data memory and the heap
growing upwards from the bottom. This avoids the need to have separate
areas for stack and heap, at the cost of needing to check regularly they
haven't collided.
Wouldn't heap growing downwards + stack growing upwards achieve the same effect?
Post by John Dallman
However, if you have large amounts of memory, measured in hundreds of MB
or more, and/or multiple threads (and thus stacks) per process, this idea
isn't very useful any more.
If you're designing a new architecture, it is worth considering having
separate stacks for return addresses and for data, so as to make "stack
smashing" security attacks harder.
https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Security
John
I agree that direction does not matter a lot, but still think that it matters a little. Local buffer overruns toward higher index appear to be more common than overruns toward lower (in "C" - negative) index. So, with stack growing upward local bufer overruns will be a little harder to exploit.
Robert Wessel
2015-07-12 11:57:22 UTC
Permalink
On Sun, 12 Jul 2015 03:46:39 -0700 (PDT), Michael S
Post by Michael S
Post by John Dallman
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
No major ones, as far as I can see. However, once an architecture has
made a choice, changing it is likely to be more trouble than it's worth.
Post by Rick C. Hodgin
Why would someone choose one design over another?
On operating systems that are limited to a single thread per process, and
have small memories, it can be convenient (or at least simple) to have
the stack growing downwards from the top of data memory and the heap
growing upwards from the bottom. This avoids the need to have separate
areas for stack and heap, at the cost of needing to check regularly they
haven't collided.
Wouldn't heap growing downwards + stack growing upwards achieve the same effect?
Post by John Dallman
However, if you have large amounts of memory, measured in hundreds of MB
or more, and/or multiple threads (and thus stacks) per process, this idea
isn't very useful any more.
If you're designing a new architecture, it is worth considering having
separate stacks for return addresses and for data, so as to make "stack
smashing" security attacks harder.
https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Security
John
I agree that direction does not matter a lot, but still think that it matters a little. Local buffer overruns toward higher index appear to be more common than overruns toward lower (in "C" - negative) index. So, with stack growing upward local bufer overruns will be a little harder to exploit.
It's a common misconception that an upwards growing stack is less
vulnerable to buffer overflow/stack smashing attacks. In one limited
case that's semi-correct, where the buffer being overflowed is in the
routine that is doing the overflowing, and there is no active return
address on the stack after the overflowed buffer. Unfortunately the
vast majority of real stack smashes use subroutines to do the dirty
work in a buffer owned by a caller. That just moves the point at
which the bad return address is used. Consider:

void f(void)
{
char s[4];
strcpy(s, "abcdefghjklmnopqrstuvwxyz");
}

With a typical downward growing stack, the smashed return address will
be the one from f(). With an upwards growing stack, it'll be the
return from strcpy() that's altered. Not exactly a huge improvement.
Michael S
2015-07-12 12:48:53 UTC
Permalink
Post by Robert Wessel
On Sun, 12 Jul 2015 03:46:39 -0700 (PDT), Michael S
Post by Michael S
Post by John Dallman
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
No major ones, as far as I can see. However, once an architecture has
made a choice, changing it is likely to be more trouble than it's worth.
Post by Rick C. Hodgin
Why would someone choose one design over another?
On operating systems that are limited to a single thread per process, and
have small memories, it can be convenient (or at least simple) to have
the stack growing downwards from the top of data memory and the heap
growing upwards from the bottom. This avoids the need to have separate
areas for stack and heap, at the cost of needing to check regularly they
haven't collided.
Wouldn't heap growing downwards + stack growing upwards achieve the same effect?
Post by John Dallman
However, if you have large amounts of memory, measured in hundreds of MB
or more, and/or multiple threads (and thus stacks) per process, this idea
isn't very useful any more.
If you're designing a new architecture, it is worth considering having
separate stacks for return addresses and for data, so as to make "stack
smashing" security attacks harder.
https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Security
John
I agree that direction does not matter a lot, but still think that it matters a little. Local buffer overruns toward higher index appear to be more common than overruns toward lower (in "C" - negative) index. So, with stack growing upward local bufer overruns will be a little harder to exploit.
It's a common misconception that an upwards growing stack is less
vulnerable to buffer overflow/stack smashing attacks. In one limited
case that's semi-correct, where the buffer being overflowed is in the
routine that is doing the overflowing, and there is no active return
address on the stack after the overflowed buffer. Unfortunately the
vast majority of real stack smashes use subroutines to do the dirty
work in a buffer owned by a caller. That just moves the point at
void f(void)
{
char s[4];
strcpy(s, "abcdefghjklmnopqrstuvwxyz");
}
With a typical downward growing stack, the smashed return address will
be the one from f(). With an upwards growing stack, it'll be the
return from strcpy() that's altered. Not exactly a huge improvement.
On many architectures the return address of leaf functions would never be on stack.
Some mitigation is possible even on architectures that are principally similar to x86. strcpy() and other susceptible library functions can be coded in a way that reads return address into register before doing actual copy. Of course, on x86 itself doing so makes a little sense, first because the stack grows downwards and second because of significant performance penalty of fouling return address stack predictor.
Piotr Wyderski
2015-07-14 11:02:03 UTC
Permalink
Post by Michael S
strcpy() and other susceptible library functions can be coded in a way that reads return address into register
before doing actual copy. Of course, on x86 itself doing so makes a
little sense, first because the stack grows
Post by Michael S
downwards and second because of significant performance penalty of
fouling return address stack predictor.

IMHO there needs not to be any fouling: restore the return
address from the register just before ret.

Best regards, Piotr
David Brown
2015-07-22 11:18:55 UTC
Permalink
Post by Michael S
Post by John Dallman
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
No major ones, as far as I can see. However, once an architecture has
made a choice, changing it is likely to be more trouble than it's worth.
Post by Rick C. Hodgin
Why would someone choose one design over another?
On operating systems that are limited to a single thread per process, and
have small memories, it can be convenient (or at least simple) to have
the stack growing downwards from the top of data memory and the heap
growing upwards from the bottom. This avoids the need to have separate
areas for stack and heap, at the cost of needing to check regularly they
haven't collided.
Wouldn't heap growing downwards + stack growing upwards achieve the same effect?
It's slightly easier to track data in a heap that grows upwards, and
would likely be a little more cache efficient (when you allocate a new
large object on the heap, you probably access it from the bottom upwards).

But I don't think there would be any serious issues with having a heap
that grows downwards and a stack that grows upwards - it's just that
there would be no benefits, and it would surprise people.
George Neuner
2015-07-22 16:13:43 UTC
Permalink
On Wed, 22 Jul 2015 13:18:55 +0200, David Brown
Post by David Brown
It's slightly easier to track data in a heap that grows upwards, and
would likely be a little more cache efficient (when you allocate a new
large object on the heap, you probably access it from the bottom upwards).
But I don't think there would be any serious issues with having a heap
that grows downwards and a stack that grows upwards - it's just that
there would be no benefits, and it would surprise people.
There are compacting - i.e. moving - GC heap implementations that
alternate allocation from top and bottom, and others that can allocate
from either end depending on the expected lifetime of the allocation.

Neither confuses people - largely because the languages involved don't
permit explicit pointer arithmetic.

George
Nick Maclaren
2015-07-12 10:59:21 UTC
Permalink
Post by John Dallman
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
No major ones, as far as I can see. However, once an architecture has
made a choice, changing it is likely to be more trouble than it's worth.
Experience from the days when there were often both on the same
system (and I don't just mean architecture) is that there is damn-all
difference. Sometimes artifacts of the instruction code favour one
over the other.
Post by John Dallman
Post by Rick C. Hodgin
Why would someone choose one design over another?
On operating systems that are limited to a single thread per process, and
have small memories, it can be convenient (or at least simple) to have
the stack growing downwards from the top of data memory and the heap
growing upwards from the bottom. This avoids the need to have separate
areas for stack and heap, at the cost of needing to check regularly they
haven't collided.
Which can be done equally well the other way. Been there - done that.
Post by John Dallman
However, if you have large amounts of memory, measured in hundreds of MB
or more, and/or multiple threads (and thus stacks) per process, this idea
isn't very useful any more.
Actually, yes, it is. But you would do it per thread, and the 'heap'
would be the thread-local data. Alternatively, you can do that for
the primary and secondary stacks (see below).
Post by John Dallman
If you're designing a new architecture, it is worth considering having
separate stacks for return addresses and for data, so as to make "stack
smashing" security attacks harder.
https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Security
That merely changes a direct stack smashing into a pointer-smashing
ones and, with procedure pointers, that's not much harder. Even
without, it's only a little harder.

That is a crude variant of the double stack approach used (first?)
in Algol 68 that seems to be used only in GNU Ada, and can give
both better security and much better performance for no program
impact! The primary stack includes the linkage (return addresses,
exception handling if needed, small scalar constants and the
pointers to large or dynamically-sized data). The secondary stack
includes those.


Regards,
Nick Maclaren.
EricP
2015-07-12 16:49:41 UTC
Permalink
Post by Nick Maclaren
That is a crude variant of the double stack approach used (first?)
in Algol 68 that seems to be used only in GNU Ada, and can give
both better security and much better performance for no program
impact! The primary stack includes the linkage (return addresses,
exception handling if needed, small scalar constants and the
pointers to large or dynamically-sized data). The secondary stack
includes those.
I don't know about Algol but Ada allows functions to return
variable sized objects, such as an array or variant record
that is declared in the callee and returned to the caller.

There are various ways to accomplish this, but 2 stacks is easiest.
The caller copies the current stack2 pointer, then calls the function.
The callee just moves the stack2 pointer down to allocate the object,
and the return pops the callee frame from stack1, but leaves stack2.
The caller uses the return object on stack2, then recovers
the space used on stack2 by copying back its prior pointer.

An alternative is allocate on the heap, but then the caller
has to also establish an exception handler to recover the return
object to the heap in case an exception occurs. The combination of
heap alloc + free + handler establish + remove is more expensive
than 2 stack approach.

Another variant was the method VAX Ada used that only required
a single stack and no handler, but can be somewhat expensive.
The caller copies the stack pointer and calls the function.
The callee allocates the return object by moving the whole
callee frame down, thereby opening a hole after the caller.
It returns a pointer to that object and the callee frame
is removed on return.

They did this because they were trying to follow the
VAX CALLS/CALLG hardware calling standard which was designed
to automatically restore the caller state on return,
which was exactly what their code did NOT want to do.

That is one reason to urge caution to those who design fancy
hardware call/return mechanisms, in that the software may require
some function that the hardware designers never anticipated.
The software winds up fighting against the built in mechanism,
usually at some expense.

RISC-like ISA's have no prescribed stack, no push and pop instructions
that assume a particular stack pointer register (or stack number),
or growth direction, or prescribed place for call/return info.

Eric
Nick Maclaren
2015-07-12 18:03:33 UTC
Permalink
Post by EricP
Post by Nick Maclaren
That is a crude variant of the double stack approach used (first?)
in Algol 68 that seems to be used only in GNU Ada, and can give
both better security and much better performance for no program
impact! The primary stack includes the linkage (return addresses,
exception handling if needed, small scalar constants and the
pointers to large or dynamically-sized data). The secondary stack
includes those.
I don't know about Algol but Ada allows functions to return
variable sized objects, such as an array or variant record
that is declared in the callee and returned to the caller.
As did Algol 68. While it has vanished with no apparent descendents,
several of the higher-level languages (e.g. Fortran and Ada) adopted
quite a few of its successful features.


Regards,
Nick Maclaren.
Quadibloc
2015-07-13 22:49:59 UTC
Permalink
Post by John Dallman
If you're designing a new architecture, it is worth considering having
separate stacks for return addresses and for data, so as to make "stack
smashing" security attacks harder.
Of course, the System/360, which was a rather secure machine, avoided the
problem entirely by not having a stack. Each program that called other programs
simply had its own save area, not connected with the save area of the program
that called it.

And thus, in general, the loader gave each program the amount of static storage
that it asked for - there was not a question of programs living in the stack.
But you could write re-entrant code on a 360, and thus you did have resident
libraries that could be called from multiple different programs running on a
system - like, say, the trig functions. Not unlike what is done today with the
.DLL in Windows, but with fewer problems (although that was largely because far
less was demanded of it).

John Savard
Stephen Fuld
2015-07-14 00:13:17 UTC
Permalink
Post by Quadibloc
Post by John Dallman
If you're designing a new architecture, it is worth considering having
separate stacks for return addresses and for data, so as to make "stack
smashing" security attacks harder.
Of course, the System/360, which was a rather secure machine, avoided the
problem entirely by not having a stack. Each program that called other programs
simply had its own save area, not connected with the save area of the program
that called it.
And thus, in general, the loader gave each program the amount of static storage
that it asked for - there was not a question of programs living in the stack.
But you could write re-entrant code on a 360, and thus you did have resident
libraries that could be called from multiple different programs running on a
system - like, say, the trig functions. Not unlike what is done today with the
.DLL in Windows, but with fewer problems (although that was largely because far
less was demanded of it).
Yes. I think that was true of most of the mainframe architectures of
the time, with the exception of the Burroughs large scale architecture
of course.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
John Levine
2015-07-14 02:08:54 UTC
Permalink
Post by Quadibloc
Of course, the System/360, which was a rather secure machine, avoided the
problem entirely by not having a stack. Each program that called other programs
simply had its own save area, not connected with the save area of the program
that called it.
The calling sequence was (is, I suppose, still in use on zSeries)
pretty well designed and flexible. What you describe is the way
Fortran and COBOL implemented it. PL/I did a stack in software,
keeping the calling sequence the same, but allowing recursion.

Algol F did it pessimally, doing GETMAIN and FREEMAIN system calls at
procedure entry and exit. Some people at Princeton made some obvious
changes to get bigger chunks and use them for multiple calls and
programs apparently ran an order of magnitude faster.
Nick Maclaren
2015-07-14 07:59:08 UTC
Permalink
Post by John Levine
Post by Quadibloc
Of course, the System/360, which was a rather secure machine, avoided the
problem entirely by not having a stack. Each program that called other programs
simply had its own save area, not connected with the save area of the program
that called it.
The calling sequence was (is, I suppose, still in use on zSeries)
pretty well designed and flexible. What you describe is the way
Fortran and COBOL implemented it. PL/I did a stack in software,
keeping the calling sequence the same, but allowing recursion.
Algol F did it pessimally, doing GETMAIN and FREEMAIN system calls at
procedure entry and exit. Some people at Princeton made some obvious
changes to get bigger chunks and use them for multiple calls and
programs apparently ran an order of magnitude faster.
And there were dozens of third-party compilers and applications that
used stacks, rising and falling, single and double (well, one was
double).


Regards,
Nick Maclaren.
Anne & Lynn Wheeler
2015-07-14 04:49:18 UTC
Permalink
Post by Quadibloc
Of course, the System/360, which was a rather secure machine, avoided the
problem entirely by not having a stack. Each program that called other programs
simply had its own save area, not connected with the save area of the program
that called it.
calling program provided register save area for use by called program
... simple case was static area within the calling area ... but didn't
support any sort of recursion.

call/save/return conventions from gcard ios3270 ... q&d html conversion
http://www.garlic.com/~lynn/gcard.html#50

as mentioned, reentrant (&/or r/o image) required dynamic allocation (in
lieu of embedded statis area) ... which tended to be an extremely
high-overhead system call.
--
virtualization experience starting Jan1968, online at home since Mar1970
Joe Chisolm
2015-07-12 15:38:06 UTC
Permalink
In a protected architecture design like i386 and later, are there any
advantages to using an expand-down stack compared to an expand-up stack?
Why would someone choose one design over another?
Best regards,
Rick C. Hodgin
The idea is for heap to grow up and stack down and make sure they
never meet. One approach is to access protect the page just past
the end of the heap. If the stack grows into that page you fault
before you overrun the heap. This could have all been dismissed
years ago if Intel et. al. had followed some of the big iron folks
with stack pointer registers that had bounds checking.
Rick C. Hodgin
2015-07-12 16:53:06 UTC
Permalink
I'm thinking with the isolation provided by the
segment selectors on i286 and later (with their
base and limit) it was not needed. The natural
memory protection mechanisms would seem to
be sufficient.

Seems that maybe it was added to support
legacy designs from the 8086, and the many
software libraries that had been written?

If SS: is an expand-up segment, do PUSH/POP,
ENTER/LEAVE work correctly (going up rather
than down)?

Best regards,
Rick C. Hodgin
Noob
2015-07-15 08:15:24 UTC
Permalink
Post by Joe Chisolm
The idea is for heap to grow up and stack down and make sure they
never meet. One approach is to access protect the page just past
the end of the heap. If the stack grows into that page you fault
before you overrun the heap. This could have all been dismissed
years ago if Intel et. al. had followed some of the big iron folks
with stack pointer registers that had bounds checking.
It seems Intel has decided to move in that direction with Skylake.
https://en.wikipedia.org/wiki/Intel_MPX

Regards.
Michael S
2015-07-15 08:27:35 UTC
Permalink
Post by Noob
Post by Joe Chisolm
The idea is for heap to grow up and stack down and make sure they
never meet. One approach is to access protect the page just past
the end of the heap. If the stack grows into that page you fault
before you overrun the heap. This could have all been dismissed
years ago if Intel et. al. had followed some of the big iron folks
with stack pointer registers that had bounds checking.
It seems Intel has decided to move in that direction with Skylake.
https://en.wikipedia.org/wiki/Intel_MPX
Regards.
Assuming that "some of the big iron folks" in the sentence above meant Burroughs B6500 and follow-ups, I see no relationship at all.
Noob
2015-07-15 08:36:30 UTC
Permalink
Post by Michael S
Post by Noob
Post by Joe Chisolm
The idea is for heap to grow up and stack down and make sure they
never meet. One approach is to access protect the page just past
the end of the heap. If the stack grows into that page you fault
before you overrun the heap. This could have all been dismissed
years ago if Intel et. al. had followed some of the big iron folks
with stack pointer registers that had bounds checking.
It seems Intel has decided to move in that direction with Skylake.
https://en.wikipedia.org/wiki/Intel_MPX
Assuming that "some of the big iron folks" in the sentence above
meant Burroughs B6500 and follow-ups, I see no relationship at all.
I don't know how "big iron folks" solved the problem, but he mentioned
"stack pointer registers that had bounds checking" and, well, MPX seems
to fit that bill, doesn't it?

Quoting Jon Corbet, "MPX is, at its core, a hardware-assisted mechanism
for performing bounds checking on pointer accesses."

https://lwn.net/Articles/582712/

Regards.
Michael S
2015-07-15 12:46:26 UTC
Permalink
Post by Noob
Post by Michael S
Post by Noob
Post by Joe Chisolm
The idea is for heap to grow up and stack down and make sure they
never meet. One approach is to access protect the page just past
the end of the heap. If the stack grows into that page you fault
before you overrun the heap. This could have all been dismissed
years ago if Intel et. al. had followed some of the big iron folks
with stack pointer registers that had bounds checking.
It seems Intel has decided to move in that direction with Skylake.
https://en.wikipedia.org/wiki/Intel_MPX
Assuming that "some of the big iron folks" in the sentence above
meant Burroughs B6500 and follow-ups, I see no relationship at all.
I don't know how "big iron folks" solved the problem, but he mentioned
"stack pointer registers that had bounds checking" and, well, MPX seems
to fit that bill, doesn't it?
No, it does not. Read the spec. MPX checks are explicit.
Post by Noob
Quoting Jon Corbet, "MPX is, at its core, a hardware-assisted mechanism
for performing bounds checking on pointer accesses."
https://lwn.net/Articles/582712/
Regards.
Mr. Corbet is confused. The MPX pointer checks are separate from actual memory access. They are far more similar to (80186) bound instruction than to (80286) segment base/limit checks.
Noob
2015-07-15 13:15:11 UTC
Permalink
Post by Michael S
Post by Noob
Post by Michael S
Post by Noob
Post by Joe Chisolm
The idea is for heap to grow up and stack down and make sure they
never meet. One approach is to access protect the page just past
the end of the heap. If the stack grows into that page you fault
before you overrun the heap. This could have all been dismissed
years ago if Intel et. al. had followed some of the big iron folks
with stack pointer registers that had bounds checking.
It seems Intel has decided to move in that direction with Skylake.
https://en.wikipedia.org/wiki/Intel_MPX
Assuming that "some of the big iron folks" in the sentence above
meant Burroughs B6500 and follow-ups, I see no relationship at all.
I don't know how "big iron folks" solved the problem, but he mentioned
"stack pointer registers that had bounds checking" and, well, MPX seems
to fit that bill, doesn't it?
No, it does not. Read the spec. MPX checks are explicit.
Why is the fact that the checks are explicit or implicit anything
more than an implementation detail? Because programs have to be
recompiled to benefit? (In fact, a dynamically-linked instrumented
libc might already flag a few bugs.)
Post by Michael S
Post by Noob
Quoting Jon Corbet, "MPX is, at its core, a hardware-assisted mechanism
for performing bounds checking on pointer accesses."
https://lwn.net/Articles/582712/
Mr. Corbet is confused. The MPX pointer checks are separate from
actual memory access. They are far more similar to (80186) bound
instruction than to (80286) segment base/limit checks.
I don't think Jon thinks the checks are implicit ("Whenever a pointer is
dereferenced, special instructions can be used to ensure that the program
is accessing memory within the range specified for that particular pointer.")
but anyway I don't understand why it matters.

Regards.
Michael S
2015-07-15 14:38:50 UTC
Permalink
Post by Noob
Post by Michael S
Post by Noob
Post by Michael S
Post by Noob
Post by Joe Chisolm
The idea is for heap to grow up and stack down and make sure they
never meet. One approach is to access protect the page just past
the end of the heap. If the stack grows into that page you fault
before you overrun the heap. This could have all been dismissed
years ago if Intel et. al. had followed some of the big iron folks
with stack pointer registers that had bounds checking.
It seems Intel has decided to move in that direction with Skylake.
https://en.wikipedia.org/wiki/Intel_MPX
Assuming that "some of the big iron folks" in the sentence above
meant Burroughs B6500 and follow-ups, I see no relationship at all.
I don't know how "big iron folks" solved the problem, but he mentioned
"stack pointer registers that had bounds checking" and, well, MPX seems
to fit that bill, doesn't it?
No, it does not. Read the spec. MPX checks are explicit.
Why is the fact that the checks are explicit or implicit anything
more than an implementation detail? Because programs have to be
recompiled to benefit?
That's pretty important implementation detail, I should say, so important that it's not an implementation detail any more. Completely different trade offs on many levels, like code is bigger, but but potentially fewer checks actually executed, so likely more energy efficient. Also checks taken away from memory access path, so they can't compromise L1D read timing.

(In fact, a dynamically-linked instrumented
Post by Noob
libc might already flag a few bugs.)
A few bugs, yes, but without compiler's aid not that many.
On the other hand, "instrumented" compiler can do without MPX everything it can do with MPX, just a bit slower and at higher cost in terms of code size. So, in that sense, MPX itself is no more than implementation detail.
Post by Noob
Post by Michael S
Post by Noob
Quoting Jon Corbet, "MPX is, at its core, a hardware-assisted mechanism
for performing bounds checking on pointer accesses."
https://lwn.net/Articles/582712/
Mr. Corbet is confused. The MPX pointer checks are separate from
actual memory access. They are far more similar to (80186) bound
instruction than to (80286) segment base/limit checks.
I don't think Jon thinks the checks are implicit ("Whenever a pointer is
dereferenced, special instructions can be used to ensure that the program
is accessing memory within the range specified for that particular pointer.")
but anyway I don't understand why it matters.
Regards.
It matters in context of MPX not being similar to how "big iron folks" did things.
On the other hand, 80286 segments and their 32-bit version added with i386, is similar to how some of "big iron folks" did things. It's here on x86 for more than 30 years and almost completely ignored by languages and OSes, esp. for last 20 years so, probably, there are good reasons why PC software infrastructure people don't consider them useful.
Also, although MPX can be used to prevent down-growing stack from meeting the up-growing heap it's not really intended for that sort of usage. The intended use is to bound-check individual objects, including those on stack, rather than whole segments, like stack and heap.
Noob
2015-07-16 14:07:48 UTC
Permalink
On 15/07/2015 16:38, Michael S wrote: [snip]

Thanks for the explanation.
Ivan Godard
2015-07-15 16:59:35 UTC
Permalink
Post by Noob
Post by Michael S
Mr. Corbet is confused. The MPX pointer checks are separate from
actual memory access. They are far more similar to (80186) bound
instruction than to (80286) segment base/limit checks.
I don't think Jon thinks the checks are implicit ("Whenever a pointer is
dereferenced, special instructions can be used to ensure that the program
is accessing memory within the range specified for that particular pointer.")
but anyway I don't understand why it matters.
Implicit and unavoidable checks are better.

Explicit checks are subject to false optimization that omits and/or
removes them. Then software must maintain multiple libraries and
interfaces, with and without, at cost to both developer and user. In
addition, the hardware must contain both the logic to do the check, and
also the logic to decode and apply the request for the check, which then
must be a separate step in the hardware rather than integrated (at much
lower cost) within the operation being checked.
MitchAlsup
2015-07-12 18:27:21 UTC
Permalink
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
Terje Mathisen
2015-07-12 18:34:10 UTC
Permalink
Post by MitchAlsup
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
This alone should be sufficient reason to use signed offsets. :-)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Anton Ertl
2015-07-13 05:33:32 UTC
Permalink
Post by Terje Mathisen
Post by MitchAlsup
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
This alone should be sufficient reason to use signed offsets. :-)
Makes me wonder how often negative offsets occur in architectures that
have signed offsets.

The 88k had zero-extended offsets. I don't remember that being a
problem, but then I did not do much on the assembly level. Maybe
Mitch Alsup can tell us what the reasoning was behind that and how it
worked out.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
***@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
Terje Mathisen
2015-07-13 08:40:56 UTC
Permalink
Post by Anton Ertl
Post by Terje Mathisen
Post by MitchAlsup
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
This alone should be sufficient reason to use signed offsets. :-)
Makes me wonder how often negative offsets occur in architectures that
have signed offsets.
16-bit x86 default ABI would use BP for all stack-relative addressing,
with BP+offset pointing at incoming parameters and BP-offset to local
stack variables:

PUSH AX
CALL foo
...

foo:
PUSH BP
MOV BP,SP
SUB SP,4

At this point we have the incoming parameter (from AX above) located at
[BP+2] and the two local variables at [BP-4] and [BP-2]

The original Turbo Pascal generated this style of code for pretty much
all function calls, I know I have seen an awful lot of disassembly
looking like that. :-)
Post by Anton Ertl
The 88k had zero-extended offsets. I don't remember that being a
problem, but then I did not do much on the assembly level. Maybe
Mitch Alsup can tell us what the reasoning was behind that and how it
worked out.
As you wrote in another message, the compiler just have to pre-decrement
the frame pointer so that all needed offsets will be positive.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
MitchAlsup
2015-07-13 22:23:45 UTC
Permalink
Post by Anton Ertl
The 88k had zero-extended offsets. I don't remember that being a
problem, but then I did not do much on the assembly level. Maybe
Mitch Alsup can tell us what the reasoning was behind that and how it
worked out.
In practice, the linker would recognize that register Rk had the high
order bits of the offset (because it was previously loaded) and just
use that register as the base pointer. The signed version can often
require 1 more registers since a carry out of bit 15 needs a different
base pointer to carry into.

But overall it is workable either way.

My current ISA provides enough bits where this kind of Tom foolery is
not necessary. It has 16, 32, and 64 bit offset/displacements, with
64-bit pointers and indexes.

Mitch
Joe keane
2015-07-14 17:42:05 UTC
Permalink
Post by Anton Ertl
Makes me wonder how often negative offsets occur in architectures that
have signed offsets.
25%
Rick C. Hodgin
2015-07-14 17:44:22 UTC
Permalink
Post by Anton Ertl
Makes me wonder how often negative offsets occur in architectures that
have signed offsets.
25%
Citation?

Best regards,
Rick C. Hodgin
Quadibloc
2015-07-13 22:42:23 UTC
Permalink
Post by Terje Mathisen
Post by MitchAlsup
Expand down stacks are preferred when the offset/displacement is positive
only.
This alone should be sufficient reason to use signed offsets. :-)
1) Only if there were some reason to use expand-up stacks.

2) A 'positive only' offset _is_ a signed offset, in two's complement
representation, unless the offset too short to express the entire virtual
address space, or unless overflows in this operation fail to be ignored.

John Savard
Rick C. Hodgin
2015-07-12 18:35:39 UTC
Permalink
Post by MitchAlsup
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
Makes sense. On x86, the mod/reg/rm encoding associates a signed
displacement allowing it to work either way equally well.

We typically use [ebp-N] and [esp+N], and also see such references
generated compilers, but these are all used on expand-down

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-12 18:41:20 UTC
Permalink
Post by MitchAlsup
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
Makes sense. On x86, the mod/reg/rm encoding associates a signed
displacement allowing it to work either way equally well.

We typically use [ebp-N] and [esp+N], and also see such references
generated by compilers, but these are all used on expand-down
segments. It would seem that using [ebp+N] and [esp-N] would work
equally well if expand-down had never been invented. :-)

I'm wondering if the physical execution of the PUSH/POP and
ENTER/LEAVE commands only operate in the expand-down form. I
have never done any testing on this as I always just followed
the herd and went with expand-down segments.

Looking now at the Intel 64 and IA-32 Architecture manual:

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

I see in section 6.2 it explicitly states that the operation of the
stack is in the expand-down form. If true, this seems to be a glaring
shortcoming in the i286 and later design.

Best regards,
Rick C. Hodgin
Ivan Godard
2015-07-12 18:48:38 UTC
Permalink
Post by MitchAlsup
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
Hunh??? Only if you are using SP as FP. But the existence of alloca and
dynamic arrays makes that impractical; you get poor code for call
idioms. Any architecture not hamstrung by legacy restrictions on number
of registers would use a different FP and SP, and for those a grow-up
uses positive offsets too.
Nick Maclaren
2015-07-12 19:24:20 UTC
Permalink
Post by Ivan Godard
Post by MitchAlsup
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
Expand down stacks are preferred when the offset/displacement is positive
only.
Hunh??? Only if you are using SP as FP. But the existence of alloca and
dynamic arrays makes that impractical; you get poor code for call
idioms. Any architecture not hamstrung by legacy restrictions on number
of registers would use a different FP and SP, and for those a grow-up
uses positive offsets too.
Right. On the System/370, most stack-based programs used rising
stacks. The only time that the stack pointer ever needed to be in
a register was during the actual linkage. Not a problem.


Regards,
Nick Maclaren.
Anton Ertl
2015-07-13 05:42:24 UTC
Permalink
Post by Ivan Godard
Post by MitchAlsup
Expand down stacks are preferred when the offset/displacement is positive
only.
Hunh??? Only if you are using SP as FP. But the existence of alloca and
dynamic arrays makes that impractical; you get poor code for call
idioms. Any architecture not hamstrung by legacy restrictions on number
of registers would use a different FP and SP, and for those a grow-up
uses positive offsets too.
Nothing forces the FP to point to the place where SP was when entering
the function. It could point to the deepest variable accessed through
FP, and thus have only positive offsets. Or (assuming 16-bit
offsets), it could point to the entrySP-32768, and be able to address
the same memory area as a signed offset that points to entrySP.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
***@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
John Levine
2015-07-13 15:30:28 UTC
Permalink
Post by Anton Ertl
Post by Ivan Godard
idioms. Any architecture not hamstrung by legacy restrictions on number
of registers would use a different FP and SP, and for those a grow-up
uses positive offsets too.
Nothing forces the FP to point to the place where SP was when entering
the function. It could point to the deepest variable accessed through
FP, and thus have only positive offsets.
It does make it easier to unwind stacks when dealing with exceptions,
although I agree that's not an overwhelming argument.
EricP
2015-07-14 01:51:49 UTC
Permalink
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.

When you work through the permutations of the above,
taking into account various object sizes and possible
interrupts that might allow an overwrite of a stack value,
I believe you'll find that grow down with pre-decrement-then-store
requires 1 arithmetic op, no temp registers, is interrupt safe,
and leaves SP pointing to the start of the object.

Pop:
SP = SP - size
store (SP, src, size)

Pop:
load (SP, dst, size)
SP = SP + size

Also SP points to the first byte of the object and
can be used directly to access that object without
further indexing off SP which would require more cycles.

All the others require either more arithmetic ops, or temp registers,
or leave a window where an interrupt could allow a value to be clobbered.

For example, for a grow up stack, where SP points to
the next byte to write, you can't do a store then post increment
because an interrupt might clobber the value between those operations.
So you have to copy SP first, then adjust SP, then store,
and that takes an extra temp register.

Save:
tmp = SP
SP = SP + size
store (tmp, src, size)

So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.

Eric
Ivan Godard
2015-07-14 02:17:26 UTC
Permalink
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.
When you work through the permutations of the above,
taking into account various object sizes and possible
interrupts that might allow an overwrite of a stack value,
I believe you'll find that grow down with pre-decrement-then-store
requires 1 arithmetic op, no temp registers, is interrupt safe,
and leaves SP pointing to the start of the object.
SP = SP - size
store (SP, src, size)
load (SP, dst, size)
SP = SP + size
Also SP points to the first byte of the object and
can be used directly to access that object without
further indexing off SP which would require more cycles.
All the others require either more arithmetic ops, or temp registers,
or leave a window where an interrupt could allow a value to be clobbered.
For example, for a grow up stack, where SP points to
the next byte to write, you can't do a store then post increment
because an interrupt might clobber the value between those operations.
So you have to copy SP first, then adjust SP, then store,
and that takes an extra temp register.
tmp = SP
SP = SP + size
store (tmp, src, size)
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Except when the amount to grow the stack by is dynamic (as in alloca or
a dynamic array or similar in various languages). Then your approach
gives you a safe access to the newly allocated object, but loses the
location of the static-sized frame of things already allocated. You'd
have to push all the statics first, save the sSP into a FP, and then do
all the dynamics.

Your approach is a classic hardware-centric view using "push" operations
into an area of memory that you are presumed to own, without concern for
wild addresses or exploits. It also assumes that ops are done
one-at-a-time, or at least the program model does so; the hardware at
some cost can figure out what it means to do two concurrent pushes. It
will work quite well (and quite economically) for matching
circumstances, such as a Z-80 in a thermostat.

In environments with more concern for RAS it is better to have a defined
notion of "frame" as opposed to "sea of address space", so for example a
code that returns a pointer to a frame local will fault rather than
being at the mercy of interrupt timing.

JMO; YMMV
EricP
2015-07-14 16:51:37 UTC
Permalink
Post by Ivan Godard
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Except when the amount to grow the stack by is dynamic (as in alloca or
a dynamic array or similar in various languages). Then your approach
gives you a safe access to the newly allocated object, but loses the
location of the static-sized frame of things already allocated. You'd
have to push all the statics first, save the sSP into a FP, and then do
all the dynamics.
Yes. Problem?
Post by Ivan Godard
Your approach is a classic hardware-centric view using "push" operations
into an area of memory that you are presumed to own, without concern for
wild addresses or exploits. It also assumes that ops are done
one-at-a-time, or at least the program model does so; the hardware at
some cost can figure out what it means to do two concurrent pushes. It
will work quite well (and quite economically) for matching
circumstances, such as a Z-80 in a thermostat.
That would have more zing if you could work the phrase
"member of the hidebound orthodoxy" in there someplace. :-)

Actually, I prefer a RISC-ish approach with no ISA defined stack,
and stack space allocated in larger 16 or 32 bytes aligned chunks.
A sequence of stores are therefore not serially dependent.
Post by Ivan Godard
In environments with more concern for RAS it is better to have a defined
notion of "frame" as opposed to "sea of address space", so for example a
code that returns a pointer to a frame local will fault rather than
being at the mercy of interrupt timing.
JMO; YMMV
I'm not sure where you got the idea that I think code should be at the
mercy of interrupt timing. I believe I was quite clear it should not.
If one is designing a stack, even in user mode interrupts are
possible and it is therefore a consideration in such design that
it NOT screw up when they do occur.

I just don't buy into this idea that anything that hardware does
is going to magically fix everything that C can do wrong.
Yes, you can fiddle around the edges a bit, but really
that just moves the problem to a new soft spot.

So Mill style hardware managed stack frames in a safe area could help.
But then... C++ object with function pointer embedded in it?

Eric
BGB
2015-07-14 17:31:50 UTC
Permalink
Post by EricP
Post by Ivan Godard
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Except when the amount to grow the stack by is dynamic (as in alloca
or a dynamic array or similar in various languages). Then your
approach gives you a safe access to the newly allocated object, but
loses the location of the static-sized frame of things already
allocated. You'd have to push all the statics first, save the sSP into
a FP, and then do all the dynamics.
Yes. Problem?
my personal thought here is to not actually have dynamically-sized data
on the stack, but instead use solely fixed-layout stack frames.

in some of my codegens, things like arrays are generally handled with
specialized memory allocators, generally in a separate region of memory.
the compiler will quietly allocate the memory on function entry and free
it when the function returns, and the array variables are implicitly
actually pointers. the performance overhead of this hasn't really been
much of an issue IME.
Post by EricP
Post by Ivan Godard
Your approach is a classic hardware-centric view using "push"
operations into an area of memory that you are presumed to own,
without concern for wild addresses or exploits. It also assumes that
ops are done one-at-a-time, or at least the program model does so; the
hardware at some cost can figure out what it means to do two
concurrent pushes. It will work quite well (and quite economically)
for matching circumstances, such as a Z-80 in a thermostat.
That would have more zing if you could work the phrase
"member of the hidebound orthodoxy" in there someplace. :-)
Actually, I prefer a RISC-ish approach with no ISA defined stack,
and stack space allocated in larger 16 or 32 bytes aligned chunks.
A sequence of stores are therefore not serially dependent.
yeah.
to some extent, explicit push/pop operations are fairly redundant.

my idea for machine-level ISAs is maybe have a defined stack-register,
and define the stack direction mostly as part of the ABI.

though, granted, this does imply call/return via a link register rather
than via the stack.
Post by EricP
Post by Ivan Godard
In environments with more concern for RAS it is better to have a
defined notion of "frame" as opposed to "sea of address space", so for
example a code that returns a pointer to a frame local will fault
rather than being at the mercy of interrupt timing.
JMO; YMMV
I'm not sure where you got the idea that I think code should be at the
mercy of interrupt timing. I believe I was quite clear it should not.
If one is designing a stack, even in user mode interrupts are
possible and it is therefore a consideration in such design that
it NOT screw up when they do occur.
I just don't buy into this idea that anything that hardware does
is going to magically fix everything that C can do wrong.
Yes, you can fiddle around the edges a bit, but really
that just moves the problem to a new soft spot.
So Mill style hardware managed stack frames in a safe area could help.
But then... C++ object with function pointer embedded in it?
yeah.

IME, managing stack frames makes sense in a high-level ISA (such as a
bytecode), but not so much at the hardware level.


granted, I am also partly in a possible minority camp that things maybe
applications (even in "native" languages such as C or C++), should maybe
move away from targeting the raw HW ISA in future targets, but instead
target a bytecode which compiles to the native ISA. this then has the
advantage of not as much tying the application code to the specific
hardware, or getting the hardware as caught up in legacy issues.

though, yes, in such a situation, you will need essentially a JIT which
runs at the firmware or OS level, and almost invariably people will want
to try to bypass this JIT in the name of trying to squeeze more
performance out of the HW.

in such a situation though, the bytecode would look as-if it were the
native ISA from the application level. as opposed to current systems
where there is typically a big obvious VM seam and the VMs tend to be
structured in a way to where any code running in them takes a bit of a
performance hit (most trying to be a bit higher-level than what I
suspect is ideal for a stand-in for a HW ISA).

granted, it may seem a bit funky to advocate a system design where the
only real true native code is located in the boot ROM, and pretty much
all the rest is transient and exists in caches...
Rick C. Hodgin
2015-07-14 12:51:53 UTC
Permalink
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.
When you work through the permutations of the above,
taking into account various object sizes and possible
interrupts that might allow an overwrite of a stack value,
I believe you'll find that grow down with pre-decrement-then-store
requires 1 arithmetic op, no temp registers, is interrupt safe,
and leaves SP pointing to the start of the object.
SP = SP - size
store (SP, src, size)
load (SP, dst, size)
SP = SP + size
Also SP points to the first byte of the object and
can be used directly to access that object without
further indexing off SP which would require more cycles.
All the others require either more arithmetic ops, or temp registers,
or leave a window where an interrupt could allow a value to be clobbered.
For example, for a grow up stack, where SP points to
the next byte to write, you can't do a store then post increment
because an interrupt might clobber the value between those operations.
So you have to copy SP first, then adjust SP, then store,
and that takes an extra temp register.
tmp = SP
SP = SP + size
store (tmp, src, size)
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Eric
This seems to be an architectural decision issue, rather than a real
one. A system designer could enforce in hardware a protocol which
makes SP point to the last entry on an expand-up segment, making the
protocol for writing a new value also be completely atomic so that
interrupts do not occur in the middle of stack writes, performing
the write and then increment in a like manner as it is today in
expand-down segments.

I don't see this as a valid argument for expand-down segments, but
only one which would require those accommodations be made in hardware.
And in truth, I can see that something nearly identical must also
exist in expand-down segments, as by a similar protocol.

Best regards,
Rick C. Hodgin
EricP
2015-07-14 16:53:23 UTC
Permalink
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.
snip
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Eric
This seems to be an architectural decision issue, rather than a real
one. A system designer could enforce in hardware a protocol which
makes SP point to the last entry on an expand-up segment, making the
protocol for writing a new value also be completely atomic so that
interrupts do not occur in the middle of stack writes, performing
the write and then increment in a like manner as it is today in
expand-down segments.
Well, it is a sort-of-architectural decision. Some of the
permutations of the listed items produce non-functional results.
Of the functional ones, on balance it seems to (marginally) have
the most optimal attributes.

I say sort-of-architectural because the decision pick one
approach and embed it in the architecture is not a functional necessity.
It is a historical fact that PDP-11 and 8080 (and others) chose to
embed a stack in their architecture. That decision comes because
a PUSH or POP instruction saves a byte here and there.
In those days, that can make the difference between a design win and loose.

Whatever method one chooses shouldn't require a non-interruptable sequence.
Or to put it another way, if one design requires a non-interruptable
sequence then I would view that as an unacceptable option.

One does have to be aware of the potential for interrupts
in the choice of design, even in user mode code.
Post by Rick C. Hodgin
I don't see this as a valid argument for expand-down segments, but
only one which would require those accommodations be made in hardware.
And in truth, I can see that something nearly identical must also
exist in expand-down segments, as by a similar protocol.
Best regards,
Rick C. Hodgin
These days, PUSH and POP are deprecated because the runtime prefers
to maintain stacks at some larger alignment, 16 bytes I think for Win64.
Also a sequence of PUSH's or POP's each change the SP, causing
multiple renames and an unnecessary dependency chain.
The modern sequence would be more RISC-ish, using 1 subtract
and a sequence of indexed stores.

And of course, RISC doesn't need to embed a stack in an ISA at all.
Even for kernel mode interrupts a variation on branch-and-link makes
an architecture stack unnecessary, as I have described here previously.

So this as all for historical compatibility.

Eric
Rick C. Hodgin
2015-07-14 20:08:31 UTC
Permalink
[snip]
EricP, I am considering your reply and, Lord willing, will prepare
a response.

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-14 21:56:57 UTC
Permalink
Post by EricP
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.
snip
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Eric
This seems to be an architectural decision issue, rather than a real
one. A system designer could enforce in hardware a protocol which
makes SP point to the last entry on an expand-up segment, making the
protocol for writing a new value also be completely atomic so that
interrupts do not occur in the middle of stack writes, performing
the write and then increment in a like manner as it is today in
expand-down segments.
Well, it is a sort-of-architectural decision. Some of the
permutations of the listed items produce non-functional results.
Of the functional ones, on balance it seems to (marginally) have
the most optimal attributes.
I say sort-of-architectural because the decision pick one
approach and embed it in the architecture is not a functional necessity.
It is a historical fact that PDP-11 and 8080 (and others) chose to
embed a stack in their architecture. That decision comes because
a PUSH or POP instruction saves a byte here and there.
In those days, that can make the difference between a design win and loose.
Whatever method one chooses shouldn't require a non-interruptable sequence.
Or to put it another way, if one design requires a non-interruptable
sequence then I would view that as an unacceptable option.
One does have to be aware of the potential for interrupts
in the choice of design, even in user mode code.
Post by Rick C. Hodgin
I don't see this as a valid argument for expand-down segments, but
only one which would require those accommodations be made in hardware.
And in truth, I can see that something nearly identical must also
exist in expand-down segments, as by a similar protocol.
Best regards,
Rick C. Hodgin
These days, PUSH and POP are deprecated because the runtime prefers
to maintain stacks at some larger alignment, 16 bytes I think for Win64.
Also a sequence of PUSH's or POP's each change the SP, causing
multiple renames and an unnecessary dependency chain.
The modern sequence would be more RISC-ish, using 1 subtract
and a sequence of indexed stores.
And of course, RISC doesn't need to embed a stack in an ISA at all.
Even for kernel mode interrupts a variation on branch-and-link makes
an architecture stack unnecessary, as I have described here previously.
So this as all for historical compatibility.
Eric
I have thought through a few. I may be missing something though.

I am considering here expand-down and expand-up operations using 32-bit
x86 architectural references for the example. I consider all stack
operations to be atomic, such that when something which alters the
stack begins it will complete without error. The only exception I see
to this would be a stack fault. In that case, in my view, the processor
should handle that condition in a special case because when the stack is
compromised for the required access, it can't be used for that access
and it must switch to another stack buffer designed for that error
condition.

-----
No data on 256 byte stack expand-down:
esp = 0xfc |........| 0xfc ... No data on stack
|........| 0xf8

No data on 256 byte stack expand-up:
|........| 0x04
esp = 0x00 |........| 0x00 ... No data on stack


-----
PUSH DWORD PTR 12345678h

Store a value on expand-down:
(1) Decrement esp
(2) Write 0x12345678
_[top]_________
|........| 0xfc ... Not used
esp = 0xf8 ... |12345678| 0xf8 ... esp pointing to pushed value at 0xf8
|........| 0xf4

Store a value on expand-up:
(1) Increment esp
(2) Write 0x12345678

|........| 0x08
esp = 0x04 ... |12345678| 0x04 --- esp pointing to pushed value at 0x04
_[top]_________|........| 0x00 ... Not used


-----
ENTER 4,0

Create a frame on expand-down:
(1) Decrement esp
(2) Copy esp to ebp
(3) Write old ebp
(4) Subtract 4 from esp for ENTER 4,0
_[top]_________
|........| 0xfc ... Not used
|12345678| 0xf8 ... Saved value from PUSH at [ebp+4]
|old ebp | 0xf4 ... Old ebp save value at [ebp]
esp = 0xf0 |xxxxxxxx| 0xf0 ... esp pointing to local variable [ebp-4]

Create a frame on expand-up:
(1) Increment esp
(2) Copy esp to ebp
(3) Write old ebp
(4) Add 4 to esp for ENTER 4,0

esp = 0x0c |xxxxxxxx| 0x0c ... esp pointing to local variable [ebp+4]
|old ebp | 0x08 ... Old ebp save value at [ebp]
|12345678| 0x04 ... Saved value from PUSH at [ebp-4]
_[top]_________|........| 0x00 ... Not used


-----
Allocate 16-bytes and populate edi pointer on expand-down:
(1) Subtract esp,16
(2) Copy esp to edi

_[top]_________
|........| 0xfc ... Not used
|12345678| 0xf8 ... Saved value from PUSH at [ebp+4]
|old ebp | 0xf4 ... Old ebp save value at [ebp]
esp = 0xf0 |xxxxxxxx| 0xf0 ... esp pointing to local variable [ebp-4]
|bbbbbbbb| 0xec
|bbbbbbbb| 0xe8
|bbbbbbbb| 0xe4
esp = 0xe0 |bbbbbbbb| 0xf0 ... esp pointing to start of buffer

Allocate 16-bytes and populate edi pointer on expand-up:
(1) Copy esp to edi
(2) Add esp,16

esp = 0x1c |bbbbbbbb| 0x1c ... esp pointing to end of buffer - 4
|bbbbbbbb| 0x18
|bbbbbbbb| 0x14
|bbbbbbbb| 0x10
|xxxxxxxx| 0x0c ... esp pointing to local variable [ebp+4]
|old ebp | 0x08 ... Old ebp save value at [ebp]
|12345678| 0x04 ... Saved value from PUSH at [ebp-4]
_[top]_________|........| 0x00 ... Not used

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-14 23:39:05 UTC
Permalink
I'll redo the allocate 16 bytes portion in a future
post.

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-15 14:31:43 UTC
Permalink
I'll redo the allocate 16 bytes portion in a future post.
-----
Allocate 16-bytes and populate edi pointer on expand-down:
(1) Subtract esp,16
(2) Copy esp to edi

_[top]_________
|........| 0xfc ... Not used
|12345678| 0xf8 ... Saved value from PUSH at [ebp+4]
ebp ---> |old ebp | 0xf4 ... Prior ebp at [ebp]
|xxxxxxxx| 0xf0 ... local variable [ebp-4]
|bbbbbbbb| 0xec
|bbbbbbbb| 0xe8
|bbbbbbbb| 0xe4
esp = 0xe0 |bbbbbbbb| 0xe0 ... esp pointing to start of buffer

(lowest to highest)
dwords within 16-bytes referenced using: [esp] 0xe0
[esp+4] 0xe4
[esp+8] 0xe8
[esp+12] 0xec

Allocate 16-bytes and populate edi pointer on expand-up:
(1) Copy esp+4 to edi
(2) Add esp,16

esp = 0x1c |bbbbbbbb| 0x1c ... esp pointing to end of buffer - 4
|bbbbbbbb| 0x18
|bbbbbbbb| 0x14
|bbbbbbbb| 0x10
|xxxxxxxx| 0x0c ... local variable [ebp+4]
ebp ---> |old ebp | 0x08 ... Prior ebp at [ebp]
|12345678| 0x04 ... Saved value from PUSH at [ebp-4]
_[top]_________|........| 0x00 ... Not used

(lowest to highest)
dwords within 16-bytes referenced using: [esp-12] 0x10
[esp-8] 0x14
[esp-4] 0x18
[esp] 0x1c

Best regards,
Rick C. Hodgin
EricP
2015-07-16 17:39:52 UTC
Permalink
Post by Rick C. Hodgin
I have thought through a few. I may be missing something though.
I am considering here expand-down and expand-up operations using 32-bit
x86 architectural references for the example. I consider all stack
operations to be atomic, such that when something which alters the
stack begins it will complete without error. The only exception I see
to this would be a stack fault. In that case, in my view, the processor
should handle that condition in a special case because when the stack is
compromised for the required access, it can't be used for that access
and it must switch to another stack buffer designed for that error
condition.
I don't follow you. I thought you wanted x86 compatibility,
and took your question about stack direction as just curiosity.
If you change the way the stack works, you might as well
do a new design from scratch.
In which case why have an ISA stack at all?

For x64, the MS calling standards includes a section on Stack Usage:
https://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx

Eric
Rick C. Hodgin
2015-07-16 17:51:47 UTC
Permalink
Post by EricP
Post by Rick C. Hodgin
I have thought through a few. I may be missing something though.
I am considering here expand-down and expand-up operations using 32-bit
x86 architectural references for the example. I consider all stack
operations to be atomic, such that when something which alters the
stack begins it will complete without error. The only exception I see
to this would be a stack fault. In that case, in my view, the processor
should handle that condition in a special case because when the stack is
compromised for the required access, it can't be used for that access
and it must switch to another stack buffer designed for that error
condition.
I don't follow you. I thought you wanted x86 compatibility,
and took your question about stack direction as just curiosity.
If you change the way the stack works, you might as well
do a new design from scratch.
In which case why have an ISA stack at all?
https://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
Eric
I think stacks are a good system. I think they should be used. I
think optimizations should be able to override the default stack
system per instance use, but on the whole I believe in stacks.

I have in mind to introduce some new architectural features which
allow the stack to not be the sole component of this inter-function
data exchange, as I also want to be able to support multiple return
parameters.

We'll see though. I don't have the time to go through all that is
required to pin everything down right now, though my desire remains
consistently strong as this thing I can't shake from my life or mind.
It is a consistent source of heartbreak for me that I don't have the
time to bring into fruition these dreams / goals. :-)

Best regards,
Rick C. Hodgin
Stephen Fuld
2015-07-16 18:14:06 UTC
Permalink
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
I have thought through a few. I may be missing something though.
I am considering here expand-down and expand-up operations using 32-bit
x86 architectural references for the example. I consider all stack
operations to be atomic, such that when something which alters the
stack begins it will complete without error. The only exception I see
to this would be a stack fault. In that case, in my view, the processor
should handle that condition in a special case because when the stack is
compromised for the required access, it can't be used for that access
and it must switch to another stack buffer designed for that error
condition.
I don't follow you. I thought you wanted x86 compatibility,
and took your question about stack direction as just curiosity.
If you change the way the stack works, you might as well
do a new design from scratch.
In which case why have an ISA stack at all?
https://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
Eric
I think stacks are a good system. I think they should be used. I
think optimizations should be able to override the default stack
system per instance use, but on the whole I believe in stacks.
The question isn't whether stacks are good or bad. I think everyone
thinks they are good in some uses and bad in others. The question for
architects is whether to provide special hardware such as special stack
manipulation instructions and dedicated stack related registers or to
have users use general purpose instructions and registers for those
purposes. Both of these choices have been done, and both can work.

If you do decide to have special purpose hardware, you have to decide
what it will do. Some example issues you have to deal with: Do you have
expect to keep return addresses on the same stack as function arguments
or do you maintain different stacks for each of these? When you take an
interrupt, do you use the special stack related hardware or do you have
some other mechanism? These, and others, are all in addition to the
issue you started this thread with about up vs down.
Post by Rick C. Hodgin
I have in mind to introduce some new architectural features which
allow the stack to not be the sole component of this inter-function
data exchange, as I also want to be able to support multiple return
parameters.
This is as much a language issue as a hardware issue. There are many
ways for the hardware to return multiple values from a function. And,
of course if you don't provide special stack related hardware, it isn't
an architectural issue at all!
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
Rick C. Hodgin
2015-07-16 18:28:33 UTC
Permalink
Post by Stephen Fuld
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
I have thought through a few. I may be missing something though.
I am considering here expand-down and expand-up operations using 32-bit
x86 architectural references for the example. I consider all stack
operations to be atomic, such that when something which alters the
stack begins it will complete without error. The only exception I see
to this would be a stack fault. In that case, in my view, the processor
should handle that condition in a special case because when the stack is
compromised for the required access, it can't be used for that access
and it must switch to another stack buffer designed for that error
condition.
I don't follow you. I thought you wanted x86 compatibility,
and took your question about stack direction as just curiosity.
If you change the way the stack works, you might as well
do a new design from scratch.
In which case why have an ISA stack at all?
https://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
Eric
I think stacks are a good system. I think they should be used. I
think optimizations should be able to override the default stack
system per instance use, but on the whole I believe in stacks.
The question isn't whether stacks are good or bad. I think everyone
thinks they are good in some uses and bad in others. The question for
architects is whether to provide special hardware such as special stack
manipulation instructions and dedicated stack related registers or to
have users use general purpose instructions and registers for those
purposes. Both of these choices have been done, and both can work.
First, thank you for your reply, Stephen. Second...

Agreed. When I say that I think stacks are "good" I don't mean that I
personally like them and for that reason I'm adding them. I mean that
I think they provide necessary utility in binary-based software design,
and one sufficient enough to be included in hardware so as to facilitate
that need with minimal software / compiler support.
Post by Stephen Fuld
If you do decide to have special purpose hardware, you have to decide
what it will do. Some example issues you have to deal with: Do you have
expect to keep return addresses on the same stack as function arguments
or do you maintain different stacks for each of these? When you take an
interrupt, do you use the special stack related hardware or do you have
some other mechanism? These, and others, are all in addition to the
issue you started this thread with about up vs down.
I think interrupts should be isolated to servicing hardware requests,
and not used as a general interfacing mechanism between software and
its OS as by API. I think there should be dedicated mechanisms for
that expected and anticipated OS/app relationship, using a system
which is setup by the OS at process load time, thereby validating
their existence and security, and then allowing that interface to
operate at full speed without consistent validation. This exists
in various ISAs already, and I think it's the way it should be.

I further believe interrupts should have their own isolated stack
which does not impede data upon the process stack, allowing for
such things as forward stack usage without prior allocation as
for fast and immediate temporary variable usage.

And I have several other thoughts on stacks and inter-process and
inter-function communication protocols.
Post by Stephen Fuld
Post by Rick C. Hodgin
I have in mind to introduce some new architectural features which
allow the stack to not be the sole component of this inter-function
data exchange, as I also want to be able to support multiple return
parameters.
This is as much a language issue as a hardware issue. There are many
ways for the hardware to return multiple values from a function. And,
of course if you don't provide special stack related hardware, it isn't
an architectural issue at all!
Of course. However, I believe hardware should inject functionality to
simplify the life of its target audience, in this case the myriad of
software that will be using it and, more importantly, the software
developers who will maintain that software.

Computers have the potential to run at such incredible speeds today
that I don't really want to design things for the highest optimization,
but rather for the most functional use by those who will be wielding
them as by software and hardware interfacing. I want to provide
mechanisms which make other people's lives easier by doing the "hard
work" for them so they can simply use the tool that's provided.

In my experience and in observation over many years, I am usually
quite alone in my pursuit of that endeavor. :-) And because I am
not a manager in my vocation, but only in my personal "at home"
projects, it is not something I am often able to employ.

Best regards,
Rick C. Hodgin
David Brown
2015-07-22 14:12:35 UTC
Permalink
On 16/07/15 20:28, Rick C. Hodgin wrote:

I've snipped some things here, and some of my comments are more directed
at previous posts than the one quoted here, just to make a single post
rather than multiple ones.


You should make a decision clear - is this design going to be x86
compatible or not? If it is to be x86 compatible, then you should aim
to make it as close as possible to 100% to an existing x86 device
(perhaps excluding the legacy modes) in order to be able to re-use
software, tools and knowledge from the x86 world. As soon as you stray
from that, and you no longer have 100% binary compatibility, then you
might as well drop all connection with x86 - partial ISA compatibility
is no better or worse than zero compatibility. And when you have
dropped any thoughts about x86, you can look around at the far better
existing ISAs, and then create a new one that is even better :-) The
x86 world is living proof that you /can/ polish a turd - but it is not
an architecture to emulate.


Assuming you are not interested in x86 for this design, the next
question is whether or not you want to aim for compatibility with
something else. You should look closely at a number of architectures -
I would include ARM, ColdFire, MIPS, SPARC, OpenRISC, and perhaps
PowerPC as modern 32-bit or 64-bit cores to consider. You might decide
that one of these suits your needs, and you can then take advantage of
existing tools (at least until you have got your own ones up to speed).
Even if you don't want to copy them directly, they can give you
inspiration and ideas for your own design.

(Note that there may be licensing, copyright or patent issues involved
here, if that concerns you.)
Post by Rick C. Hodgin
I think interrupts should be isolated to servicing hardware requests,
and not used as a general interfacing mechanism between software and
its OS as by API. I think there should be dedicated mechanisms for
that expected and anticipated OS/app relationship, using a system
which is setup by the OS at process load time, thereby validating
their existence and security, and then allowing that interface to
operate at full speed without consistent validation. This exists
in various ISAs already, and I think it's the way it should be.
I further believe interrupts should have their own isolated stack
which does not impede data upon the process stack, allowing for
such things as forward stack usage without prior allocation as
for fast and immediate temporary variable usage.
It is common to have separate stack pointers for user space and
supervisor space, and perhaps also for interrupts (though that is often
just part of supervisor space). For RISC architectures that don't have
a dedicated stack pointer, but merely conventional usage of a register
as a stack pointer, this is of course handled in software.
Post by Rick C. Hodgin
And I have several other thoughts on stacks and inter-process and
inter-function communication protocols.
If I were designing a new architecture, then interprocess communication
would be a key aspect. I would be aiming for some sort of hardware
semaphores, and perhaps pipelines or message queues, along with hardware
to make thread and task switching efficient. My interest is in embedded
systems rather than general-purpose cpus, but the principle is the same.

An architecture for inspiration here would be XMOS.


Since you are also interested in language design and OS design, it might
be better to start higher up in the process. Think what sort of
features your languages need, and how that might be best implemented in
hardware - and look to making that part of the ISA. To take extreme
examples, if you were a Forth fan then your cpu should be able to work
efficiently with a data stack, but would not need many registers. If
you want to work with video data, then strong SIMD support would be key.
If you are aiming for safety over efficiency, then you might want
hardware that makes it easy to include bounds along with pointer
operations. If you wanted C++ support, then perhaps you would have
hardware dedicated to fast vtable accesses and exceptions. And so on.

Similarly, consider how your OS will work and what features it could
benefit from - putting them in hardware can make the code easier and
more efficient.

And don't be afraid to limit options if it is more practical - there is
no need to support up and down stacks if all usage of the chip will be
with a grow-down stack. Too many choices is not much better than too
few choices.
Rick C. Hodgin
2015-07-22 14:24:44 UTC
Permalink
Post by David Brown
[snip]
You should make a decision clear - is this design going to be x86
compatible or not?
You may have missed previous discussions here from a few months ago,
and a few recent posts which relate back to it.

Here's a deep link to the project, one which shows that it has support
for three ISAs, i80386-x40 (a 40-bit extension of the original i80386
design, with some i486 and later extensions added), an ARM-based design
which is extended to 40-bits, and then my own ISA which I call LibSF
386-x40, or li386-x40 for short:

Loading Image...

My goals are to make things easy for developers, not to have the most
efficient design or code that's possible as by protractive effort. All
of that can come through revision. I want to empower people to be able
to build things using the tools I give them as easy building blocks to
build atop. Optimization can come later if necessary because today
computers are beyond being fast enough.

Best regards,
Rick C. Hodgin
David Brown
2015-07-22 17:47:27 UTC
Permalink
Post by Rick C. Hodgin
Post by David Brown
[snip]
You should make a decision clear - is this design going to be x86
compatible or not?
You may have missed previous discussions here from a few months ago,
and a few recent posts which relate back to it.
Here's a deep link to the project, one which shows that it has support
for three ISAs, i80386-x40 (a 40-bit extension of the original i80386
design, with some i486 and later extensions added), an ARM-based design
which is extended to 40-bits, and then my own ISA which I call LibSF
https://github.com/RickCHodgin/libsf/blob/master/li386/li386-documentation/images/eflags.png
I have seen a number of your threads in the past discussing plans for a
40-bit extension to 80386 and for ARM-based instruction set. But you
said in this thread "This would be for a new architecture, not one
confined to x86's legacy baggage".
Post by Rick C. Hodgin
My goals are to make things easy for developers, not to have the most
efficient design or code that's possible as by protractive effort. All
of that can come through revision. I want to empower people to be able
to build things using the tools I give them as easy building blocks to
build atop. Optimization can come later if necessary because today
computers are beyond being fast enough.
The x86 ISA is in no way "easy for developers". It is a horrible
design. If you make a chip that is entirely compatible (at least for
user space - it doesn't matter so much if there are differences visible
to the OS), then at least it is a familiar horror and developers can use
the code and tools they know. But making something that is mostly like
the 80386 (a very outdated version of the x86 family), but somewhat
different, gives developers the worst of both worlds while
simultaneously making your own job as difficult as possible. The same
applies, but to a lesser extent, with a 40-bit extension to ARM.

Making a cpu design with three different ISA's is an extremely difficult
task. Supporting two ISA's is hard enough - you need only look at the
dog's breakfast Intel did with the Itanium's x86 support to see that
(and that's with a top team of experienced cpu designers, intimately
familiar with both ISA's). Supporting two general ISA's only makes
sense when you need to run legacy binaries - and thus only makes sense
when you have complete compatibility with existing systems. Do you
really expect people to run existing 80386 programs directly on this
chip? Or existing ARM programs? Without recompiling? I honestly think
that would be highly unlikely. You will not be making things easy for
developers, but will be giving them the confusing choice of three ISA's
- one of which is completely unknown but presumably nice (your own one),
one of which is mostly like 386 but horrible to work with, and one that
is mostly like ARM but different enough to need new tools.

Thus I suggest that if you want to make a new cpu, either make it with a
single compatible existing ISA, or make it with a single new one that
fits the goals of the rest of your system better. (It occurs to me that
with your love of edit-and-continue style debugging, it is likely that
an ISA could be designed with this in mind to make it more powerful and
more efficient.)

Oh, and the computers of today are not "beyond fast enough" - remember
Wirth's law "software is getting slower more rapidly than hardware
becomes faster". Though I agree that optimisation and speed come as
lower priority - Heathfield's law applies to hardware design as well as
software.
Rick C. Hodgin
2015-07-22 17:57:36 UTC
Permalink
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
[snip]
You should make a decision clear - is this design going to be x86
compatible or not?
You may have missed previous discussions here from a few months ago,
and a few recent posts which relate back to it.
Here's a deep link to the project, one which shows that it has support
for three ISAs, i80386-x40 (a 40-bit extension of the original i80386
design, with some i486 and later extensions added), an ARM-based design
which is extended to 40-bits, and then my own ISA which I call LibSF
https://github.com/RickCHodgin/libsf/blob/master/li386/li386-documentation/images/eflags.png
I have seen a number of your threads in the past discussing plans for a
40-bit extension to 80386 and for ARM-based instruction set. But you
said in this thread "This would be for a new architecture, not one
confined to x86's legacy baggage".
I am not supporting x86 as it is only. There are extensions being
added, but at the opcode level is is more or less compatible. Some
differences exist because it can support 40-bit pointers, and 40-bit
registers (wax instead of eax).
Post by David Brown
Post by Rick C. Hodgin
My goals are to make things easy for developers, not to have the most
efficient design or code that's possible as by protractive effort. All
of that can come through revision. I want to empower people to be able
to build things using the tools I give them as easy building blocks to
build atop. Optimization can come later if necessary because today
computers are beyond being fast enough.
The x86 ISA is in no way "easy for developers". It is a horrible
design. If you make a chip that is entirely compatible (at least for
user space - it doesn't matter so much if there are differences visible
to the OS), then at least it is a familiar horror and developers can use
the code and tools they know. But making something that is mostly like
the 80386 (a very outdated version of the x86 family), but somewhat
different, gives developers the worst of both worlds while
simultaneously making your own job as difficult as possible. The same
applies, but to a lesser extent, with a 40-bit extension to ARM.
You should read a little about my design if interested. I've opened
up quite a bit by adding register window extensions, and also by
introducing an additional interrupt stack. No longer do all temporary
variables have to be allocated on the stack, which reduces stack
pressure.
Post by David Brown
Making a cpu design with three different ISA's is an extremely difficult
task. Supporting two ISA's is hard enough - you need only look at the
dog's breakfast Intel did with the Itanium's x86 support to see that
(and that's with a top team of experienced cpu designers, intimately
familiar with both ISA's). Supporting two general ISA's only makes
sense when you need to run legacy binaries - and thus only makes sense
when you have complete compatibility with existing systems. Do you
really expect people to run existing 80386 programs directly on this
chip?
No.
Post by David Brown
Or existing ARM programs?
No.
Post by David Brown
Without recompiling? I honestly think that would be highly unlikely.
It will require recompiling in all cases, but the hardware will be
similar enough (especially in i80386-x40 mode) so that everything
people know today about the i386 will be generally conveyed, with
a few extensions provided for by new flags.
Post by David Brown
You will not be making things easy for
developers, but will be giving them the confusing choice of three ISA's
- one of which is completely unknown but presumably nice (your own one),
one of which is mostly like 386 but horrible to work with, and one that
is mostly like ARM but different enough to need new tools.
I don't generally plan on supporting legacy applications or operating
systems. I plan on completing my Exodus OS, and my Armodus OS, and
porting the unified form to my own hardware, and using my own Visual
FreePro language, and C compilers, creating my own hardware and
software stack as an offering unto the Lord. An open effort given to
mankind using the skills He first gave me, and not doing it for money.
Post by David Brown
Thus I suggest that if you want to make a new cpu, either make it with a
single compatible existing ISA, or make it with a single new one that
fits the goals of the rest of your system better. (It occurs to me that
with your love of edit-and-continue style debugging, it is likely that
an ISA could be designed with this in mind to make it more powerful and
more efficient.)
I appreciate your advice.
Post by David Brown
Oh, and the computers of today are not "beyond fast enough" - remember
Wirth's law "software is getting slower more rapidly than hardware
becomes faster". Though I agree that optimisation and speed come as
lower priority - Heathfield's law applies to hardware design as well as
software.
The 500 MHz Pentium III I had back in 2000 was beyond fast enough.
Software has gotten far more complex because machine resources existed
to make it easier for them to write at higher levels of code, letting
the machine do the extra work for them. But back in the day, we
managed to do quite a lot on the 500 MHz machines.

For my LibSF 386-x40 CPU, I'm targeting a maximum of 100 MHz clock
speed, or thereabouts. It has a 5-stage pipeline and I doubt I will
be able to get much more out of it than that, except for the graces
provided for by the sapphire substrate, and any new process technologies
which come down the pipe. And we shall see on that.

I would welcome help on this project. As it is, it will simply be my
life's work offered unto the Lord, and to man.

Best regards,
Rick C. Hodgin
David Brown
2015-07-22 18:36:50 UTC
Permalink
Post by Rick C. Hodgin
Post by David Brown
Making a cpu design with three different ISA's is an extremely difficult
task. Supporting two ISA's is hard enough - you need only look at the
dog's breakfast Intel did with the Itanium's x86 support to see that
(and that's with a top team of experienced cpu designers, intimately
familiar with both ISA's). Supporting two general ISA's only makes
sense when you need to run legacy binaries - and thus only makes sense
when you have complete compatibility with existing systems. Do you
really expect people to run existing 80386 programs directly on this
chip?
No.
Post by David Brown
Or existing ARM programs?
No.
Post by David Brown
Without recompiling? I honestly think that would be highly unlikely.
It will require recompiling in all cases, but the hardware will be
similar enough (especially in i80386-x40 mode) so that everything
people know today about the i386 will be generally conveyed, with
a few extensions provided for by new flags.
The percentage of developers that know anything about the details of the
i386 is almost negligible. Almost everyone who codes for x86 systems
does so using C or a higher level language. The same is true of ARM,
except with ARM there a larger number of low-level programmers who are
at least familiar with the assembly from microcontroller development.

The people that know 386 coding well enough to be able to do something
useful at that level are mostly :

1. People who write compilers (and they would all prefer to write
compilers for a different architecture than x86).

2. People writing very low-level code in operating systems (and they
would mostly prefer a different architecture than x86).

3. People writing highly optimised library code for time-critical
functions (and they use SIMD vector units, and care little for the
general purpose coding).

4. People who think assembly language is a fun and challenging hobby,
and use x86 because that's what they have on their PC's.

I suspect that you fall into all four categories here!

But the point is that the great majority of other developers that might
use this chip will be happy if there is a decent C compiler for it, or
perhaps the other development tools that you hope to make. There is no
advantage to them in having any x86 compatibility at the cpu level.

But if you start with a clean slate and implement some novel and
interesting features, such as ones aimed at improved debugging
(automatic bounds checking, run-time code patching or thread-aware
breakpointing, for example), then they will see something special in
your cpu rather than just a slow, sort-of x86 device with a weird and
ugly way of breaking the 4GB barrier (something that was done long ago
with PAE on the Pentium Pro).
Post by Rick C. Hodgin
Post by David Brown
You will not be making things easy for
developers, but will be giving them the confusing choice of three ISA's
- one of which is completely unknown but presumably nice (your own one),
one of which is mostly like 386 but horrible to work with, and one that
is mostly like ARM but different enough to need new tools.
I don't generally plan on supporting legacy applications or operating
systems. I plan on completing my Exodus OS, and my Armodus OS, and
porting the unified form to my own hardware, and using my own Visual
FreePro language, and C compilers, creating my own hardware and
software stack as an offering unto the Lord. An open effort given to
mankind using the skills He first gave me, and not doing it for money.
That's what I thought was your plan, at least in the long term - though
you might have been expecting to have compatible tools and binaries as a
step towards that goal. And with that plan, I can't see how x86 or ARM
partial compatibility does anything but give you more work along the
way, and a poorer result in the end.
Post by Rick C. Hodgin
Post by David Brown
Thus I suggest that if you want to make a new cpu, either make it with a
single compatible existing ISA, or make it with a single new one that
fits the goals of the rest of your system better. (It occurs to me that
with your love of edit-and-continue style debugging, it is likely that
an ISA could be designed with this in mind to make it more powerful and
more efficient.)
I appreciate your advice.
I am aiming to help - and I am trying not to sound negative, though I
think you already know that I am sceptical to the level of ambition you
have here. My aim here is to recommend a clearer and more
time-efficient path towards your goal, as far as I can see it.
Post by Rick C. Hodgin
Post by David Brown
Oh, and the computers of today are not "beyond fast enough" - remember
Wirth's law "software is getting slower more rapidly than hardware
becomes faster". Though I agree that optimisation and speed come as
lower priority - Heathfield's law applies to hardware design as well as
software.
The 500 MHz Pentium III I had back in 2000 was beyond fast enough.
Software has gotten far more complex because machine resources existed
to make it easier for them to write at higher levels of code, letting
the machine do the extra work for them. But back in the day, we
managed to do quite a lot on the 500 MHz machines.
I have done a great deal with slower systems than that - and I still do.
But unfortunately, users expect systems like we have today, not like
those we were happy with 15 years ago.
Post by Rick C. Hodgin
For my LibSF 386-x40 CPU, I'm targeting a maximum of 100 MHz clock
speed, or thereabouts. It has a 5-stage pipeline and I doubt I will
be able to get much more out of it than that, except for the graces
provided for by the sapphire substrate, and any new process technologies
which come down the pipe. And we shall see on that.
I would welcome help on this project. As it is, it will simply be my
life's work offered unto the Lord, and to man.
Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-22 19:01:41 UTC
Permalink
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
Making a cpu design with three different ISA's is an extremely difficult
task. Supporting two ISA's is hard enough - you need only look at the
dog's breakfast Intel did with the Itanium's x86 support to see that
(and that's with a top team of experienced cpu designers, intimately
familiar with both ISA's). Supporting two general ISA's only makes
sense when you need to run legacy binaries - and thus only makes sense
when you have complete compatibility with existing systems. Do you
really expect people to run existing 80386 programs directly on this
chip?
No.
Post by David Brown
Or existing ARM programs?
No.
Post by David Brown
Without recompiling? I honestly think that would be highly unlikely.
It will require recompiling in all cases, but the hardware will be
similar enough (especially in i80386-x40 mode) so that everything
people know today about the i386 will be generally conveyed, with
a few extensions provided for by new flags.
The percentage of developers that know anything about the details of the
i386 is almost negligible. Almost everyone who codes for x86 systems
does so using C or a higher level language. The same is true of ARM,
except with ARM there a larger number of low-level programmers who are
at least familiar with the assembly from microcontroller development.
The people that know 386 coding well enough to be able to do something
1. People who write compilers (and they would all prefer to write
compilers for a different architecture than x86).
2. People writing very low-level code in operating systems (and they
would mostly prefer a different architecture than x86).
3. People writing highly optimised library code for time-critical
functions (and they use SIMD vector units, and care little for the
general purpose coding).
4. People who think assembly language is a fun and challenging hobby,
and use x86 because that's what they have on their PC's.
I suspect that you fall into all four categories here!
It's why I'm doing this for people. I want people to be able to use
the tools I provide them at the Visual FreePro level (graphical UI,
data engine, network communication, Internet) for general apps, and
to be able to dip down into my C compiler called RDC for low-level
stuff, and then into assembly for the lowest-level where required.

I want to use my skills to make other people's lives easier. And
even at the hardware level for low-level developers, and for hardware
people, I want to provide simple connectivity and easy APIs for
add-on devices.

Those are my targets: a robust platform that works, is extensible,
and easy to be adapted to by people who know how to adapt to it.
And for those who don't want to operate at those low levels, there
is a graphical Visual Studio-like environment which allows them to
write, test, and debug, robust code for general apps.
Post by David Brown
But the point is that the great majority of other developers that might
use this chip will be happy if there is a decent C compiler for it, or
perhaps the other development tools that you hope to make. There is no
advantage to them in having any x86 compatibility at the cpu level.
But if you start with a clean slate and implement some novel and
interesting features, such as ones aimed at improved debugging
(automatic bounds checking, run-time code patching or thread-aware
breakpointing, for example), then they will see something special in
your cpu rather than just a slow, sort-of x86 device with a weird and
ugly way of breaking the 4GB barrier (something that was done long ago
with PAE on the Pentium Pro).
PAE was still limited to 4 GB per task. The machine could address more
than 4 GB, but as Linus Torvalds is quoted, "Only if you jumped through
hoops."

LibSF 386-x40 is a true 32-bit or 40-bit machine. One Terabyte per core
of local memory, along with a shared terabyte, resulting in three
terabytes of addressable memory per my highest-end design.
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
You will not be making things easy for
developers, but will be giving them the confusing choice of three ISA's
- one of which is completely unknown but presumably nice (your own one),
one of which is mostly like 386 but horrible to work with, and one that
is mostly like ARM but different enough to need new tools.
I don't generally plan on supporting legacy applications or operating
systems. I plan on completing my Exodus OS, and my Armodus OS, and
porting the unified form to my own hardware, and using my own Visual
FreePro language, and C compilers, creating my own hardware and
software stack as an offering unto the Lord. An open effort given to
mankind using the skills He first gave me, and not doing it for money.
That's what I thought was your plan, at least in the long term - though
you might have been expecting to have compatible tools and binaries as a
step towards that goal. And with that plan, I can't see how x86 or ARM
partial compatibility does anything but give you more work along the
way, and a poorer result in the end.
Nearly all of the code should work with recompile. No real changes.
And this includes code which has been designed around x86 quirks,
including nearly all of the assembly language opcodes.

That's a huge base of code that can be ported with minimal effort.
I also eventually plan to provide a .NET compatible library, and a C#
compiler to allow that code to port over as well.
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
Thus I suggest that if you want to make a new cpu, either make it with a
single compatible existing ISA, or make it with a single new one that
fits the goals of the rest of your system better. (It occurs to me that
with your love of edit-and-continue style debugging, it is likely that
an ISA could be designed with this in mind to make it more powerful and
more efficient.)
I appreciate your advice.
I am aiming to help - and I am trying not to sound negative, though I
think you already know that I am sceptical to the level of ambition you
have here. My aim here is to recommend a clearer and more
time-efficient path towards your goal, as far as I can see it.
It is very ambitious, but I do not consider myself to be doing it
alone. I have a large amount of general knowledge on the tasks I
am pursuing. In some areas (x86, OS development, coding and
algorithms in general) I consider myself to be an expert. In the
areas of physical hardware, I had never touched Verilog before
last year, for example. But, it made perfect sense to me, and I
was able to run things through simulation which worked. I have
no doubts I can do it, but I am unproven.

And in the areas of true hardware design outside of the safe FPGA
environments, there is so much I will need help on. But, my
prayers are to the Lord, and I know that if it's part of His plan,
His will, that it will happen.

If not, it's been a very fun project and I've enjoyed it. It has
made me sad that others haven't come on board to help me though.
Perhaps in time that sadness will turn to joy.
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
Oh, and the computers of today are not "beyond fast enough" - remember
Wirth's law "software is getting slower more rapidly than hardware
becomes faster". Though I agree that optimisation and speed come as
lower priority - Heathfield's law applies to hardware design as well as
software.
The 500 MHz Pentium III I had back in 2000 was beyond fast enough.
Software has gotten far more complex because machine resources existed
to make it easier for them to write at higher levels of code, letting
the machine do the extra work for them. But back in the day, we
managed to do quite a lot on the 500 MHz machines.
I have done a great deal with slower systems than that - and I still do.
But unfortunately, users expect systems like we have today, not like
those we were happy with 15 years ago.
I think having lots of parallel devices with their own compute abilities
will allow a slower central core to provide usable technology, especially
when you run multiple of them, and workloads for specific things are
off-loaded, such as a GPU handling its own rendering ops.
Post by David Brown
Post by Rick C. Hodgin
For my LibSF 386-x40 CPU, I'm targeting a maximum of 100 MHz clock
speed, or thereabouts. It has a 5-stage pipeline and I doubt I will
be able to get much more out of it than that, except for the graces
provided for by the sapphire substrate, and any new process technologies
which come down the pipe. And we shall see on that.
I would welcome help on this project. As it is, it will simply be my
life's work offered unto the Lord, and to man.
Best regards,
Rick C. Hodgin
John Dallman
2015-07-22 23:54:00 UTC
Permalink
Post by Rick C. Hodgin
Nearly all of the code should work with recompile. No real changes.
And this includes code which has been designed around x86 quirks,
including nearly all of the assembly language opcode
I still don't see why you want to make it 40-bit rather than 64-bit. A
lot of software is now 64-bit capable, and more will be. And porting
between 64-bit platforms is easier than porting to a new pointer size.
Neatness of setting up segments just isn't relevant to application
programmers.

John
Rick C. Hodgin
2015-07-23 01:32:49 UTC
Permalink
Post by John Dallman
Post by Rick C. Hodgin
Nearly all of the code should work with recompile. No real changes.
And this includes code which has been designed around x86 quirks,
including nearly all of the assembly language opcode
I still don't see why you want to make it 40-bit rather than 64-bit. A
lot of software is now 64-bit capable, and more will be. And porting
between 64-bit platforms is easier than porting to a new pointer size.
Neatness of setting up segments just isn't relevant to application
programmers.
To be honest, I don't know why I had the initial idea. However,
something later happened which made me think it may have been
divinely inspired.

The number 40 is significant in the Bible. And in late 2014 I had
a dream where I saw a very specific sequence of events. I wrote
about it sometime shortly thereafter on my Facebook page. Here are
the series of images and descriptions I saw in the dream:

https://www.facebook.com/photo.php?fbid=10152679283083145&l=a0ce1c0139

It was a progressive sequence which flowed like a fluid animation.
When I published the images, I purposefully withheld part of the
sequence, which I did knowingly, but also didn't know why at the
time. To be honest, I still don't know why. :-) In any event...

At the time I didn't think twice about it, except that it really
struck me as unusual because I don't usually have dreams like that.

Later, when I was at a co-worker's desk, he had his computer
monitor's desktop showing a sequence of images. His chosen theme
was flowers and butterflies on flowers. I was helping him with a
Visual FoxPro computer problem, and it took several seconds here
and there to process the file to the point where the error in the
program was, so I sat there looking at his desktop watching it
cycle through the images. The more I looked at the images on the
butterfly wings, the more I began to realize they looked like UV-
unwrapped texture data as you might find in a 3D animation project.
In fact, it was so striking in that regard I was floored.

I then began to consider the dual-nature of the butterfly, how it
starts out life as a caterpillar and then sometime later goes into
its chrysalis with its body dissolving and reassembling itself into
a new creation, and in that way the DNA of the butterfly serves two
distinct purposes: caterpillar, and butterfly.

I began to consider my relationship with God, and my faith in Jesus
Christ, and the message of the Bible, and it occurred to me that
perhaps God had encoded a message within our DNA (the DNA of all
life) so that each organism He created, those which likely follow
after kinds specifically, compose a message He crafted before He
created the Earth. And that within that message was information
that could not be decoded until the end times when the technology
we would have would be able to sequence the genomes and identify
the base-4 nature of DNA.

I then reflected back on my cross dream. The cross was laid out
in a very specific pattern, with each part of the cross being
specifically comprised of two cyan colored circles touching each
other. It occurred to me that this double-connection is very
similar to the relationship seen in the A,T and C,G pairing found
in DNA base pairs.

I then began to consider the geometry of the cross layout in my
dream and drew out a format which showed what it would be if the
parts which were not part of the cross had been filled in with
emptiness, but rather filled in with other double-cyan circles.
This resulted in this image:

https://www.facebook.com/photo.php?fbid=10152874010973145&l=226fe1ff3b

If you'll note, there is a 5 x 8 configuration there, which is
40 pieces.

I began to consider this in relation to my prior unknown-as-to-why-I-
had-it interest in the 40-bit 80386 extension project, and there were
just too many similarities to ignore.

This led to me thinking about all of this for a few days, and I then
created a project I call "DNA Project Butterfly":

https://groups.google.com/forum/#!forum/dnaprojectbutterfly

I began to consider that the A,T and C,G combinations relate to
separate sets of data, possibly instructions and data, or possibly
immediate data and address data within the volumetric DNA data which
God had seeded in the "unused" or "junk" DNA sections that our
scientists to date have not been able to find a purpose of.

I began writing some software which explored this possibility,
trying various combinations on the first genome which occurred to
me, which was the Monarch Butterfly (see the image on the Google
groups link above).

If you look closely at the wing, you can see that it looks like
there's a simple candle there. I again reflected back on the Bible
and the line, "And God said 'Let there be light.' And there was
light."

The word Monarch means "king" and the Monarch butterfly has this
candle image on its wing, one which is very obvious to see, and
since Jesus is called King of kings, and Lord of lords in scripture,
I thought, "Maybe this is God's way of starting the ball in motion,
so that the information contained within the DNA can be discovered."

I was able to procure a download of the Monarch butterfly DNA and
began scanning for common data types we have, assuming a type of
32-bit floating point representation for the 3D data points, colors,
texture mapping, etc. I didn't find anything obvious. I began
looking for other forms, and over time I began to consider the 5x8
sequence of the cross dream I had, with the sequences on what would've
been the cross posts themselves as being special data. I also tried
to inject ideas relating to things God has instilled in His people as
by guidance, such as letting the land lay fallow every 7th year, and
the 7x7 year of Jubilee, etc.

I simply haven't had enough time to go through the permutations. But
in the process of discovery I was able to find a lot of information
in prime numbers which I think when you examine their digital roots
reveals some crucial information in how to decode the DNA in a way
that sorts out "noise" from "signal".

It's all a theory, and one I have failed to test to date. But, to
answer your question why I was going to 40-bits ... I don't know.
I believe it has been divinely inspired so that I would continue on
this path with DNA Project Butterfly, or at least seed the idea to
those who have the time, funding, and technical skill to continue
the work and discover that which I believe is there: A full 3D
animation sequence from God to man, one which reveals inside the
very DNA of all life the story of God, Man, the Fall, our Salvation
through Jesus Christ, our Restoration to His Holy Kingdom, and the
condemnation of those who remain in rebellion against Him, those
who are unwilling to go to His Son and ask forgiveness for their
sin.

It would really be quite something. And I look forward to seeing
how that particular story ends.

But as for the 40-bit CPU, I believe that encoded within the "pages"
of DNA data in the 5x8 (40-bit pattern) are something like what we
saw on the Itanium, bundles of 40-bit words containing instructions
which define and self-assemble the message God put into the DNA data,
so that it is itself the computer program, the hardware to run it
(through an understanding of what the data there means, then applied
to the computer hardware we have today through an emulation library
that is constructed), and the data from which the computer program
is seeded, and continues to wield throughout the movie.

I have also had the thought that Ezekiel's Third Temple, described
in much greater detail in the Bible than the other two temples,
possibly describes an integrated circuit. When the Lord physically
appeared on the Earth He would walk over blue sapphire. And it
occurred to me that this sapphire substrate could be the foundation
for a radio circuit of some kind, one which also contains logic and
compute abilities, that is essentially the hardware required to
compute the DNA Project Butterfly story.

It's all theory. It's all wild I admit. However, these things
have occurred to me, and I have pursued them, and they have not
gone away from my thinking, but are placed back into my thoughts
from time to time. I have a white board at home covered with ideas
relating to these things. I just need time, funding, and some
people educated better than I am in the nuances of scripture to
bounce ideas off of, as God assembles His team to complete this
project.

I look forward to seeing how it all ends. It would be really
amazing to see the message from God to man in the very fabric
of the life which He first created on this planet.

Best regards,
Rick C. Hodgin
Chris M. Thomasson
2015-07-23 02:48:17 UTC
Permalink
[...]
Post by Rick C. Hodgin
The number 40 is significant in the Bible. And in late 2014 I had
about it sometime shortly thereafter on my Facebook page. Here are
https://www.facebook.com/photo.php?fbid=10152679283083145&l=a0ce1c0139
FWIW, I had a weird lucid dream about fractals:

https://plus.google.com/101799841244447089430/posts/CctxeQSZ7EV

Even though I was totally lucid, I still could not manage to actually touch
the damn fractal surface!

Grrrr! I was a bit pissed off.

;^)
Walter Banks
2015-07-23 15:38:30 UTC
Permalink
Post by Rick C. Hodgin
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
Making a cpu design with three different ISA's is an extremely difficult
task. Supporting two ISA's is hard enough - you need only look at the
dog's breakfast Intel did with the Itanium's x86 support to see that
(and that's with a top team of experienced cpu designers, intimately
familiar with both ISA's). Supporting two general ISA's only makes
sense when you need to run legacy binaries - and thus only makes sense
when you have complete compatibility with existing systems. Do you
really expect people to run existing 80386 programs directly on this
chip?
No.
Post by David Brown
Or existing ARM programs?
No.
Post by David Brown
Without recompiling? I honestly think that would be highly unlikely.
It will require recompiling in all cases, but the hardware will be
similar enough (especially in i80386-x40 mode) so that everything
people know today about the i386 will be generally conveyed, with
a few extensions provided for by new flags.
The percentage of developers that know anything about the details of the
i386 is almost negligible. Almost everyone who codes for x86 systems
does so using C or a higher level language. The same is true of ARM,
except with ARM there a larger number of low-level programmers who are
at least familiar with the assembly from microcontroller development.
The people that know 386 coding well enough to be able to do something
1. People who write compilers (and they would all prefer to write
compilers for a different architecture than x86).
2. People writing very low-level code in operating systems (and they
would mostly prefer a different architecture than x86).
3. People writing highly optimised library code for time-critical
functions (and they use SIMD vector units, and care little for the
general purpose coding).
4. People who think assembly language is a fun and challenging hobby,
and use x86 because that's what they have on their PC's.
I suspect that you fall into all four categories here!
It's why I'm doing this for people. I want people to be able to use
the tools I provide them at the Visual FreePro level (graphical UI,
data engine, network communication, Internet) for general apps, and
to be able to dip down into my C compiler called RDC for low-level
stuff, and then into assembly for the lowest-level where required.
I want to use my skills to make other people's lives easier. And
even at the hardware level for low-level developers, and for hardware
people, I want to provide simple connectivity and easy APIs for
add-on devices.
Those are my targets: a robust platform that works, is extensible,
and easy to be adapted to by people who know how to adapt to it.
And for those who don't want to operate at those low levels, there
is a graphical Visual Studio-like environment which allows them to
write, test, and debug, robust code for general apps.
Post by David Brown
But the point is that the great majority of other developers that might
use this chip will be happy if there is a decent C compiler for it, or
perhaps the other development tools that you hope to make. There is no
advantage to them in having any x86 compatibility at the cpu level.
But if you start with a clean slate and implement some novel and
interesting features, such as ones aimed at improved debugging
(automatic bounds checking, run-time code patching or thread-aware
breakpointing, for example), then they will see something special in
your cpu rather than just a slow, sort-of x86 device with a weird and
ugly way of breaking the 4GB barrier (something that was done long ago
with PAE on the Pentium Pro).
PAE was still limited to 4 GB per task. The machine could address more
than 4 GB, but as Linus Torvalds is quoted, "Only if you jumped through
hoops."
LibSF 386-x40 is a true 32-bit or 40-bit machine. One Terabyte per core
of local memory, along with a shared terabyte, resulting in three
terabytes of addressable memory per my highest-end design.
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
You will not be making things easy for
developers, but will be giving them the confusing choice of three ISA's
- one of which is completely unknown but presumably nice (your own one),
one of which is mostly like 386 but horrible to work with, and one that
is mostly like ARM but different enough to need new tools.
I don't generally plan on supporting legacy applications or operating
systems. I plan on completing my Exodus OS, and my Armodus OS, and
porting the unified form to my own hardware, and using my own Visual
FreePro language, and C compilers, creating my own hardware and
software stack as an offering unto the Lord. An open effort given to
mankind using the skills He first gave me, and not doing it for money.
That's what I thought was your plan, at least in the long term - though
you might have been expecting to have compatible tools and binaries as a
step towards that goal. And with that plan, I can't see how x86 or ARM
partial compatibility does anything but give you more work along the
way, and a poorer result in the end.
Nearly all of the code should work with recompile. No real changes.
And this includes code which has been designed around x86 quirks,
including nearly all of the assembly language opcodes.
That's a huge base of code that can be ported with minimal effort.
I also eventually plan to provide a .NET compatible library, and a C#
compiler to allow that code to port over as well.
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
Thus I suggest that if you want to make a new cpu, either make it with a
single compatible existing ISA, or make it with a single new one that
fits the goals of the rest of your system better. (It occurs to me that
with your love of edit-and-continue style debugging, it is likely that
an ISA could be designed with this in mind to make it more powerful and
more efficient.)
I appreciate your advice.
I am aiming to help - and I am trying not to sound negative, though I
think you already know that I am sceptical to the level of ambition you
have here. My aim here is to recommend a clearer and more
time-efficient path towards your goal, as far as I can see it.
It is very ambitious, but I do not consider myself to be doing it
alone. I have a large amount of general knowledge on the tasks I
am pursuing. In some areas (x86, OS development, coding and
algorithms in general) I consider myself to be an expert. In the
areas of physical hardware, I had never touched Verilog before
last year, for example. But, it made perfect sense to me, and I
was able to run things through simulation which worked. I have
no doubts I can do it, but I am unproven.
And in the areas of true hardware design outside of the safe FPGA
environments, there is so much I will need help on. But, my
prayers are to the Lord, and I know that if it's part of His plan,
His will, that it will happen.
If not, it's been a very fun project and I've enjoyed it. It has
made me sad that others haven't come on board to help me though.
Perhaps in time that sadness will turn to joy.
I think having lots of parallel devices with their own compute abilities
will allow a slower central core to provide usable technology, especially
when you run multiple of them, and workloads for specific things are
off-loaded, such as a GPU handling its own rendering ops.
Couple comments.

1) You need to spend some time at a different level looking at the the
design issues of an ISA. One of the first things that comes out of ISA
design is most are designed around a particular theme with specific
goals and applications in mind around the limitations of of a
implementation technology. There effectiveness requires a thought
process similar to a well engineered piece of software.

2) ISA's are the building blocks (DNA) needed to translate applications
into executable code. This is not new to anyone. It would help to
develop a fundamental understanding of actual process of doing this.
Look at a compiler in the abstract, the translation process that looks
at an application and extracts meaning out of the source code and
devises an approach to solving that problem this time on that target
environment.

Write a toy ISA and create a compiler for it. It is addictive but it
would do a lot to understand the issues you have been raising.

w..
Rick C. Hodgin
2015-07-23 16:08:11 UTC
Permalink
Post by Walter Banks
Write a toy ISA and create a compiler for it. It is addictive but it
would do a lot to understand the issues you have been raising.
I have devised an ISA for my Oppie-1 project, written an assembler
for it, and an emulator:

https://github.com/RickCHodgin/libsf/tree/master/li386/oppie/oppie1

I am also currently working on two compiler projects. One for an XBASE
language similar to Visual FoxPro, the other a C-like language which
borrows some technology from C++, plus a few of my own add-ins. I am
also creating a virtual machine for Visual FreePro, which is designed
to have fundamental knowledge of objects at the ISA level:

https://github.com/RickCHodgin/libsf/tree/master/source/vjr/source/compiler
https://github.com/RickCHodgin/libsf/tree/master/documentation/vvm/OBED

It is an ongoing consideration in my life, and has been since I set
my mind to it on July 12, 2012.

BTW, the only reason I'm working on this project is because I found
out some very disturbing things about Richard Stallman. He is on
record saying that he believes certain things should be legal that
are heinous. I had contacted him before creating LibSF because I
wanted to complete GNU's kernel, the HURD. I asked him about how I
should proceed on that. He told me I shouldn't, that GNU doesn't
need a kernel because they have a good one in Linux. He pointed me
to a couple other projects and I began considering those things.
In the process I came across the information about Richard, and I
contacted him by email to verify if what I had seen was accurate or
not. He said it was.

I decided then and there that I could not support GNU or FSF, and
created LibSF as a Christian alternative. I have been proceeding
with all of my "free time" and efforts focused on this project ever
since. I have tried to gather support from others, and a few have
come forward, but most don't have the skills necessary to
contribute beyond prayer and non-technical support.

I continue to press on because I know who it is I'm pressing on for.
I would like to have help from other people, but the fundamental
direction I'm moving is in service to the Lord, and for pretty much
everyone I've encountered to date, that's a total deal breaker. I've
even had people tell me that flatly, that if I were to do it for
money they'd help, but to simply do it for the Lord ... nope.

It's been hurtful and it's been lonely, and this has been my place
of existence since I started this project over 3 years ago. It
hasn't been awful though, because I know who it is I'm writing all
of this work for ... and He is worth it.

I do appreciate your input. If I had more time I would go through
more iterations of preparation before committing myself. However,
I have a working knowledge of a lot of the software-side of hardware
interfaces, and I can see why they exist and understand it in that
philosophical way. And the path I'm taking has specific purposes
which I visualize and scope out within the framework of my faith,
realizing that it won't be something the world embraces, but it may
be something those who also believe very strongly embrace, and it
is for Him, and for them, that I labor.

Best regards,
Rick C. Hodgin
Quadibloc
2015-07-24 20:58:10 UTC
Permalink
Post by Rick C. Hodgin
He is on
record saying that he believes certain things should be legal that
are heinous.
That is odd, but without more information I cannot condemn this: there are
considerations other than the moral value of an act which determine its
suitability for legal prohibition. Such as the availability of police resources,
and side effects of such a law.

So one could take such a position without approving of that which is heinous.

John Savard
Robert Wessel
2015-07-24 21:15:02 UTC
Permalink
On Fri, 24 Jul 2015 13:58:10 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Rick C. Hodgin
He is on
record saying that he believes certain things should be legal that
are heinous.
That is odd, but without more information I cannot condemn this: there are
considerations other than the moral value of an act which determine its
suitability for legal prohibition. Such as the availability of police resources,
and side effects of such a law.
So one could take such a position without approving of that which is heinous.
RMS has made some statements on sexuality that most people would find
(at least) a bit over the top.

http://en.wikiquote.org/wiki/Richard_Stallman#On_sex
Rick C. Hodgin
2015-07-24 21:43:32 UTC
Permalink
It's not something everyone can understand, at
least initially, but the fact is God instructs us on
how to be morally, ethically. As His creation, it
is His right. And when God instructs us to be a
particular way, it opens up the possibility thru
choice to be another way. When that happens,
the person is not doing what God prescribed,
but rather something else. God calls that "sin,"
and punishes those who sin because the universe
He made operates a particular way as per His
design, and when parts of that creation operate
contrary to His design, it's like a defective circuit
or component, malfunctioning.

We normally think of malfunctioning components
in terms of our creations: some device we've made,
but for God His needs are more extensive and
comprehensive because it is not one thing an
eternal being could corrupt, but how much?

Sin is like cancer. You cannot abide to live with
it. It must be completely excised, removed with
no part remaining. And that is what God is doing
with His creation.

Any voice, drive, desire, which purports some
thought or action contrary to that which God
has prescribed comes from God's enemy. It is
the voice of the anti-Christ enticing unto sin,
because the enemy knows how serious and
life-ending sin is in the unrepentant lifestyle.

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-24 22:00:33 UTC
Permalink
I was unable to commit myself to do work for
GNU or FSF because those are of Stallman, and
he is operating under the guidance of that anti-Christ
spirit. He would not understand this or believe
that it is true, nor would many others including
many who identify themselves as Christians.

The truth is there is a battle on for our eternal
soul. The enemy spirit works very hard to entice
and distract people by the myriad of ways and
opportunities in this world, but it is only God
who leads us rightly, which is why His is called
"The Holy Spirit," for it is separate, distinct, and
unique.

The truth always speaks with one voice. "How
old are you, Rick?" There is one correct answer
to that question, and many incorrect answers.
And this relationship to the truth, and to lies,
extends to all aspects of our lives.

When Stallman acquiesces to such an enticing
voice that is contrary to God, what else will He
do?

Am I perfect? Oh no. I make plenty of mistakes.
Some real doozies even. But my heart is focused
upon the Lord, and His Holy Spirit won't let me
abide in sin for very long, and the gift of guilt,
remorse, and repentance He pours out on me
allows me to come back to where I should be.

Those like Stallman who do not pursue God and
His Holy Spirit have no such shepherd, and they
instead blaze their own moral and ethical path
thru this world, ignoring all warnings from God,
and from God thru men like me. It is his choice
to do so, but it comes with a steep price: being
put away from everything God created for us to
thrive and flourish in ... for all eternity.

God's ways are right and true, and Jesus will
receive anyone who comes to Him with a pure
heart in search of the truth.

I pray Stallman sees the light before it's too late.
None of us are promised tomorrow, and we all
must seek Him today.

I pray this makes sense to each of you.

Best regards,
Rick C. Hodgin
Bruce Hoult
2015-07-25 00:56:36 UTC
Permalink
Post by Robert Wessel
On Fri, 24 Jul 2015 13:58:10 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Rick C. Hodgin
He is on
record saying that he believes certain things should be legal that
are heinous.
That is odd, but without more information I cannot condemn this: there are
considerations other than the moral value of an act which determine its
suitability for legal prohibition. Such as the availability of police resources,
and side effects of such a law.
So one could take such a position without approving of that which is heinous.
RMS has made some statements on sexuality that most people would find
(at least) a bit over the top.
http://en.wikiquote.org/wiki/Richard_Stallman#On_sex
Reading RMS's actual words, I find little to disagree with:

https://stallman.org/articles/extreme.html

Stallman is completely correct to distinguish between things that some or even most people might find objectionable or immoral, and things that the secular state should make illegal. It's very different.


I don't agree with Stallman on everything. In particular I choose to put my own community contributions towards projects with BSD or MIT style licences, rather than to GPL projects. That is mainly to preserve my *own* future freedom to use these projects as part of commercial products I may be working on at mega-corporations.

But I'm very glad RMS has worked so hard and so long to produce the things he has, from emacs to gcc to the GNU clones of traditional Unix tools. This seems to me to be God's work.
Rick C. Hodgin
2015-07-25 01:19:50 UTC
Permalink
Post by Bruce Hoult
But I'm very glad RMS has worked so hard and so long to produce the
things he has... This seems to me to be God's work.
God is a jealous God. Jesus gave us this guidance on our actions
here in this world, and it would behoove us to remember that when
God flooded the Earth in the days of Noah, He recorded that there
were people marrying, and giving away in marriage, right up until
the day Noah entered the ark. This signifies that people were
engaged actively in the benefit of other people, helping them,
buying or creating gifts for them, sharing in their good times,
and the like. However, not one of them was saved because they
were operating outside the will of God. Only Noah, a preacher,
and his wife, and their three sons and their wives, eight people
in all, were saved. And to put an emphasis point on just how
heinous sin is, consider that God "started over" with a preacher,
and those members of his household which were involved actively
in this pursuit of God in their lives, and yet did sin end when
those who were sinful were destroyed? No. It returned because
of our sin nature, the result of Adam's original sin in the Garden
of Eden. We all have this sin nature, and it is leading us toward
our own destruction. It's why all of us need Jesus Christ.

Jesus said this very clearly, so that we would know His nature,
His purpose in our lives, compared to anything else, a reiteration
of that which I posted previously (the truth has one voice, but
there are many lies):

http://biblehub.com/matthew/12-30.htm
http://biblehub.com/luke/11-23.htm
"He that is not with me is against me: and he that gathereth
not with me scattereth."

It doesn't matter what we do in this world. If we are not giving
our lives over to Jesus Christ, in pursuit of truth, in pursuit of
those things God has guided us for and toward upon this Earth, we
are by definition following another voice, and that means we are
following the enemy of God, and that enemy has only one goal: the
destruction of everything God has created, including you and me.

Jesus alone sets us back on the right path by giving us new life
when our sin is forgiven. He receives our sin, and we receive His
sinless perfection, and because our sin is transferred to Him, we
have no sin any longer, and all is restored as it was before sin
entered the world. We are alive again spiritually (eternally),
and the very nature of our existence changes in that instance, so
that we are not the same after salvation as before. It's why it
is called being "born again." It is a literal rebirth into life,
one which fundamentally alters everything about a person from the
inside out, and completely comprehensively. It's why born again
people change so much, and begin speaking about Jesus, about our
need of salvation... they have experienced the change, and now
that their eyes have been opened, their mind restored, and they
can see clearly without the blinders that our initial fall and
the death it brings has placed upon us (toward Godly things),
the reality of reality is revealed to us, and it is not something
that can be ignored.

Born again people have a new nature, and it is fundamentally
different than the one found prior to being born again here in
this world because this world is in sin.

It's not something that a person who is not born again can
understand, but it is something that can be explained to people,
and those who have an ear to hear the truth, and begin to pursue
the truth, those who recognize their sin, repent, come to Jesus
Christ and ask forgiveness, for all of those it will become clear
what is being spoken about here once they are born again.

It is that salvation that the enemy is trying to keep all people
from coming to, because in Jesus Christ is real life, and that
is life eternal.

I will answer any questions you have if you would like to email
me privately. There is so much to know about God's Kingdom. A
person can study a lifetime and never learn it all, but this much
is certain: those things you have learned about Christians and
Christianity from any source other than the Bible, and God Himself,
and specifically when you are truly in pursuit of the truth, will
be a misleading teaching by the enemy designed to keep you away
from coming to a saving knowledge of Jesus Christ.

There really is a battle on for your eternal soul. Right now you
are lost, and the only way to be saved, to be redeemed, to be
restored to that which God originally intended for us in this
world, and in that which is to come, is to come to Jesus Christ
and ask forgiveness. It literally requires nothing else, for
within that coming to Him you will have been drawn by God the
Father, given the ability to repent and believe upon His sacrifice
at the cross for you.

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-25 01:40:34 UTC
Permalink
Post by Bruce Hoult
Post by Robert Wessel
RMS has made some statements on sexuality that most people would find
(at least) a bit over the top.
http://en.wikiquote.org/wiki/Richard_Stallman#On_sex
https://stallman.org/articles/extreme.html
It's worth noting that the words quoted in the Wikipedia article are
his actual words. You can find them from one of his pages in a section
back from around 2006 IIRC. He also confirmed those words to me in
email when I contacted him. I specifically sent those quotes, and a
few others, and asked him if he was taken out of context, or if there
were any misquotes, etc. He affirmed all of them in July, 2012. His
doing so began a new chapter of my life.

I recognize not everyone will see these things as I am explaining
them. Until a person is born again it is not possible for them to
follow the voice of God apart from those attributes He's put into
our physical nature (a conscience, an inner sense of knowing right
from wrong, a basic moral compass, etc., all of which can be
dismissed through conscious thought, like a searing with a hot iron,
over time it becomes calloused and hard, and can no longer be
heard).

But it is God who guides us rightly, and the things that Stallman
believes in, purports, and thinks should be legal, are things that
God calls an abomination, and has specifically warned that the
punishment of such things is eternal death in Hell.

God created the universe, and He has given us guidance on how to
live in His creation. That guidance is not harmful or malicious.
It is benevolent and peaceful, loving and caring. The central
pillar of Christianity is Jesus Christ and the example He gave,
becoming a servant to one another so that we might make their
lives better. He did the impossible for us, performed the true
miracle of miracles, and then He called us to follow after Him,
to take up our cross and walk.

Our lives are to be in service to God, which naturally transcribes
into service to one another because He put us here upon this Earth
as He did to create a community. He cares tremendously for us,
and it is sin He despises, and all who embrace sin will be put to
eternal shame in the lake of fire. But for those who desire to
walk rightly, to seek Him as He is, to be a part of His Kingdom,
those who come to repentance, and ask forgiveness, and continue on
in repentance, they are those whom God is allowing all of this
stuff on Earth to transpire, because of those who will be saved.
He is letting the horrible things happen because there are still
more people who will be saved, and they are so precious in His
sight that what is happening with man (condemnation for sin in
the lake of fire forever) cannot compare to the precious nature
of those who will repent and be saved, for each of us was made
in the very image of God, in His own likeness, being comprised
as He is with three parts: soul, spirit, body.

We are amazing creations, and God does not want any to be lost
when there are still some who will be saved.

Jesus said it this way:

http://biblehub.com/kjv/matthew/13.htm
(read verse 24 thru 30)
"24 Another parable put he forth unto them, saying, The
kingdom of heaven is likened unto a man which sowed
good seed in his field:
25 But while men slept, his enemy came and sowed tares
among the wheat, and went his way.
26 But when the blade was sprung up, and brought forth
fruit, then appeared the tares also.
27 So the servants of the householder came and said unto
him, Sir, didst not thou sow good seed in thy field?
from whence then hath it tares?
28 He said unto them, An enemy hath done this. The servants
said unto him, Wilt thou then that we go and gather them
up?
29 But he said, Nay; lest while ye gather up the tares, ye
root up also the wheat with them.
30 Let both grow together until the harvest: and in the time
of harvest I will say to the reapers, Gather ye together
first the tares, and bind them in bundles to burn them:
but gather the wheat into my barn."

He is talking not about wheat and tares, but about us. People.
Those who are His, and those who are not His.

Sin is that heinous, destructive, utterly harmful. Consider that
one sin by Adam and Eve ... all the hate. All the death. All the
wars. All the struggles. Sin destroys like cancer. It's why it
is being put away. Forever.

Best regards,
Rick C. Hodgin
Bruce Hoult
2015-07-25 02:04:36 UTC
Permalink
Post by Rick C. Hodgin
But it is God who guides us rightly, and the things that Stallman
believes in, purports, and thinks should be legal, are things that
God calls an abomination, and has specifically warned that the
punishment of such things is eternal death in Hell.
This paragraph is sufficient to show us how ideologically blinded you are.

Let me try to put it in your terms (which neither I nor Stallman believe in, but no matter).

Stallman does not say he approves of those things. He does not say he does them (I am sure he does not). He does not claim that you might not in fact be placing your mortal soul in danger by doing those things, as you apparently believe.

He says that these things are not rightly the concern of man's earthly government and laws. They are for God to judge, not man.
Rick C. Hodgin
2015-07-25 02:21:49 UTC
Permalink
Post by Bruce Hoult
He says that these things are not rightly the
concern of man's earthly government and
laws. They are for God to judge, not man.
God has judged them. And because He cares
about us, and WILL judge unrepentant sin, He
has told us His views on these and many other
matters. His guidance is given to us so we will
move rightly here upon this Earth, individually,
and societally as per our laws, morals, ethics,
and philosophies.

His is a loving, guiding hand.

Best regards,
Rick C. Hodgin
Melzzzzz
2015-07-25 03:35:26 UTC
Permalink
On Fri, 24 Jul 2015 19:21:49 -0700 (PDT)
Post by Rick C. Hodgin
Post by Bruce Hoult
He says that these things are not rightly the
concern of man's earthly government and
laws. They are for God to judge, not man.
God has judged them. And because He cares
about us, and WILL judge unrepentant sin, He
has told us His views on these and many other
matters. His guidance is given to us so we will
move rightly here upon this Earth, individually,
and societally as per our laws, morals, ethics,
and philosophies.
His is a loving, guiding hand.
No, no, no. God changes His mind last week. He told me that there is
new Covenant that everything is possible and impossible in same time.
He also told me that he erased lake of fire and made lake of jelly
and shiny beaches. This is because He forgave us sins and made Satan a
friend. He also told Santa not to advertise Coca Cola any more.
There is more, but this is major thing!
EricP
2015-07-25 01:47:42 UTC
Permalink
Post by Robert Wessel
RMS has made some statements on sexuality that most people would find
(at least) a bit over the top.
http://en.wikiquote.org/wiki/Richard_Stallman#On_sex
I guess this would be a bad time for a joke.... Oh what the hell...


Well, that could explain all those 'NAMBLA' variables in his code.


Eric
Chris M. Thomasson
2015-07-25 02:14:44 UTC
Permalink
Post by EricP
Post by Robert Wessel
RMS has made some statements on sexuality that most people would find
(at least) a bit over the top.
http://en.wikiquote.org/wiki/Richard_Stallman#On_sex
I guess this would be a bad time for a joke.... Oh what the hell...
Well, that could explain all those 'NAMBLA' variables in his code.
YIKES!!!! :^O

Those bastards should be arrested and tossed into the middle of the
general population of a hard core prison with a dunce hat on that says
"prisoners are all total morons!".

Grrrrrrr!

:^|
Chris M. Thomasson
2015-07-25 02:29:49 UTC
Permalink
Post by Chris M. Thomasson
Post by EricP
Post by EricP
Post by Robert Wessel
RMS has made some statements on sexuality that most people would find
(at least) a bit over the top.
http://en.wikiquote.org/wiki/Richard_Stallman#On_sex
I guess this would be a bad time for a joke.... Oh what the hell...
Well, that could explain all those 'NAMBLA' variables in his code.
YIKES!!!! :^O
Those bastards should be arrested and tossed into the middle of the
general population of a hard core prison with a dunce hat on that says
"prisoners are all total morons!".
I read the wiki, and:

Well, IMVHO, this guy is a dangerous kook. Big time. This son of a bitch
thinks its
okay to do that horrible act with kids! WTF!?!?!?!

Sounds like your joke is right on par Eric: Fu%k that bastard.

I will never feel the same again when I use GCC.

Sigh. ;^/
Melzzzzz
2015-07-25 03:48:07 UTC
Permalink
On Fri, 24 Jul 2015 19:29:49 -0700
Post by Chris M. Thomasson
Post by Chris M. Thomasson
Post by EricP
Post by EricP
Post by Robert Wessel
RMS has made some statements on sexuality that most people would find
(at least) a bit over the top.
http://en.wikiquote.org/wiki/Richard_Stallman#On_sex
I guess this would be a bad time for a joke.... Oh what the hell...
Well, that could explain all those 'NAMBLA' variables in his code.
YIKES!!!! :^O
Those bastards should be arrested and tossed into the middle of the
general population of a hard core prison with a dunce hat on that
says "prisoners are all total morons!".
Well, IMVHO, this guy is a dangerous kook. Big time. This son of a
bitch thinks its
okay to do that horrible act with kids! WTF!?!?!?!
Well, Serbian king( and saint) Milutin (medeival ages) married 5 year
old girl Simonida... He had 45 and he had sex with her when she was 8.
That was his fifth marriage...

Michael S
2015-07-22 17:41:09 UTC
Permalink
Post by David Brown
I've snipped some things here, and some of my comments are more directed
at previous posts than the one quoted here, just to make a single post
rather than multiple ones.
You should make a decision clear - is this design going to be x86
compatible or not? If it is to be x86 compatible, then you should aim
to make it as close as possible to 100% to an existing x86 device
(perhaps excluding the legacy modes) in order to be able to re-use
software, tools and knowledge from the x86 world. As soon as you stray
from that, and you no longer have 100% binary compatibility, then you
might as well drop all connection with x86 - partial ISA compatibility
is no better or worse than zero compatibility. And when you have
dropped any thoughts about x86, you can look around at the far better
existing ISAs, and then create a new one that is even better :-) The
x86 world is living proof that you /can/ polish a turd - but it is not
an architecture to emulate.
Assuming you are not interested in x86 for this design, the next
question is whether or not you want to aim for compatibility with
something else. You should look closely at a number of architectures -
I would include ARM, ColdFire, MIPS, SPARC, OpenRISC, and perhaps
PowerPC as modern 32-bit or 64-bit cores to consider. You might decide
that one of these suits your needs, and you can then take advantage of
existing tools (at least until you have got your own ones up to speed).
Even if you don't want to copy them directly, they can give you
inspiration and ideas for your own design.
(Note that there may be licensing, copyright or patent issues involved
here, if that concerns you.)
I think, SPARC is free.
However, somebody who did both SPARC and x86 suggested here on comp.arch decade or so ago that it's easier to make x86 fly than SPARC.

If I am not mistaken, POWER is not free in a "free speech" sense, but is free in "free beer" sense, esp. for non-commercial use.
But, of course, POWER is very complex ISA by itself, although from performance perspective complexity of POWER is less problematic than some issues in "simpler" SPARC.

MIPS and ARM are certainly non-free.
However people cloned MIPS many times by modifying ISA just a little and got away with that. That includes such big commercial companies as Altera and Xilinx.
I never heard about similar ARM clones, but there are many things I never heard about.
Post by David Brown
Post by Rick C. Hodgin
I think interrupts should be isolated to servicing hardware requests,
and not used as a general interfacing mechanism between software and
its OS as by API. I think there should be dedicated mechanisms for
that expected and anticipated OS/app relationship, using a system
which is setup by the OS at process load time, thereby validating
their existence and security, and then allowing that interface to
operate at full speed without consistent validation. This exists
in various ISAs already, and I think it's the way it should be.
I further believe interrupts should have their own isolated stack
which does not impede data upon the process stack, allowing for
such things as forward stack usage without prior allocation as
for fast and immediate temporary variable usage.
It is common to have separate stack pointers for user space and
supervisor space, and perhaps also for interrupts (though that is often
just part of supervisor space). For RISC architectures that don't have
a dedicated stack pointer, but merely conventional usage of a register
as a stack pointer, this is of course handled in software.
Post by Rick C. Hodgin
And I have several other thoughts on stacks and inter-process and
inter-function communication protocols.
If I were designing a new architecture, then interprocess communication
would be a key aspect. I would be aiming for some sort of hardware
semaphores, and perhaps pipelines or message queues, along with hardware
to make thread and task switching efficient. My interest is in embedded
systems rather than general-purpose cpus, but the principle is the same.
An architecture for inspiration here would be XMOS.
Since you are also interested in language design and OS design, it might
be better to start higher up in the process. Think what sort of
features your languages need, and how that might be best implemented in
hardware - and look to making that part of the ISA. To take extreme
examples, if you were a Forth fan then your cpu should be able to work
efficiently with a data stack, but would not need many registers. If
you want to work with video data, then strong SIMD support would be key.
If you are aiming for safety over efficiency, then you might want
hardware that makes it easy to include bounds along with pointer
operations. If you wanted C++ support, then perhaps you would have
hardware dedicated to fast vtable accesses and exceptions. And so on.
Similarly, consider how your OS will work and what features it could
benefit from - putting them in hardware can make the code easier and
more efficient.
And don't be afraid to limit options if it is more practical - there is
no need to support up and down stacks if all usage of the chip will be
with a grow-down stack. Too many choices is not much better than too
few choices.
David Brown
2015-07-22 18:38:47 UTC
Permalink
Post by David Brown
I've snipped some things here, and some of my comments are more
directed at previous posts than the one quoted here, just to make a
single post rather than multiple ones.
You should make a decision clear - is this design going to be x86
compatible or not? If it is to be x86 compatible, then you should
aim to make it as close as possible to 100% to an existing x86
device (perhaps excluding the legacy modes) in order to be able to
re-use software, tools and knowledge from the x86 world. As soon
as you stray from that, and you no longer have 100% binary
compatibility, then you might as well drop all connection with x86
- partial ISA compatibility is no better or worse than zero
compatibility. And when you have dropped any thoughts about x86,
you can look around at the far better existing ISAs, and then
create a new one that is even better :-) The x86 world is living
proof that you /can/ polish a turd - but it is not an architecture
to emulate.
Assuming you are not interested in x86 for this design, the next
question is whether or not you want to aim for compatibility with
something else. You should look closely at a number of
architectures - I would include ARM, ColdFire, MIPS, SPARC,
OpenRISC, and perhaps PowerPC as modern 32-bit or 64-bit cores to
consider. You might decide that one of these suits your needs, and
you can then take advantage of existing tools (at least until you
have got your own ones up to speed). Even if you don't want to copy
them directly, they can give you inspiration and ideas for your own
design.
(Note that there may be licensing, copyright or patent issues
involved here, if that concerns you.)
I think, SPARC is free. However, somebody who did both SPARC and x86
suggested here on comp.arch decade or so ago that it's easier to make
x86 fly than SPARC.
If I am not mistaken, POWER is not free in a "free speech" sense, but
is free in "free beer" sense, esp. for non-commercial use. But, of
course, POWER is very complex ISA by itself, although from
performance perspective complexity of POWER is less problematic than
some issues in "simpler" SPARC.
I don't think it would be a realistic idea to copy SPARC or PowerPC
directly, merely that Rick might get some interesting ideas by reading
about those architectures as well as the others mentioned.
MIPS and ARM are certainly non-free. However people cloned MIPS many
times by modifying ISA just a little and got away with that. That
includes such big commercial companies as Altera and Xilinx. I never
heard about similar ARM clones, but there are many things I never
heard about.
Rick C. Hodgin
2015-07-22 19:04:31 UTC
Permalink
Post by David Brown
[snip]
I don't think it would be a realistic idea to copy SPARC or PowerPC
directly, merely that Rick might get some interesting ideas by reading
about those architectures as well as the others mentioned.
I have been looking at many different ISAs, and will continue to do
so. However, I am not going to venture too far away from x86 and ARM,
and that is by design.

If I do venture away, it will be via a separate add-on processor that
can operate in a unified LibSF x40 CPU, but one that's explicitly
designated as a heterogeneous core design, likely LibSF 100-x40, and
then LibSF 200-x40, etc.

Best regards,
Rick C. Hodgin
MitchAlsup
2015-07-22 23:19:18 UTC
Permalink
Post by Michael S
I think, SPARC is free.
This is a misnomer. Whereas anyone (and his brother) can copy the ISA of
SPARC, the probability one could use the available information and create
a chip that would boot one of the current OSs is 0.00002%
So, while the ISA is free, things like the MMU are not even available.
Quadibloc
2015-07-24 20:52:48 UTC
Permalink
My understanding is that SPARC has a nominal licence fee of $99 which lets you
implement it even commercially. There were easy terms for PowerPC as well, but
they may have changed.

x86, on the other hand, is not only not free, it's not for sale at any price. Intel got stuck with having AMD around.

John Savard
David Brown
2015-07-22 14:18:35 UTC
Permalink
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.
snip
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Eric
This seems to be an architectural decision issue, rather than a real
one. A system designer could enforce in hardware a protocol which
makes SP point to the last entry on an expand-up segment, making the
protocol for writing a new value also be completely atomic so that
interrupts do not occur in the middle of stack writes, performing
the write and then increment in a like manner as it is today in
expand-down segments.
Well, it is a sort-of-architectural decision. Some of the
permutations of the listed items produce non-functional results.
Of the functional ones, on balance it seems to (marginally) have
the most optimal attributes.
I say sort-of-architectural because the decision pick one
approach and embed it in the architecture is not a functional necessity.
It is a historical fact that PDP-11 and 8080 (and others) chose to
embed a stack in their architecture. That decision comes because
a PUSH or POP instruction saves a byte here and there.
In those days, that can make the difference between a design win and loose.
Whatever method one chooses shouldn't require a non-interruptable sequence.
Or to put it another way, if one design requires a non-interruptable
sequence then I would view that as an unacceptable option.
One does have to be aware of the potential for interrupts
in the choice of design, even in user mode code.
Post by Rick C. Hodgin
I don't see this as a valid argument for expand-down segments, but
only one which would require those accommodations be made in hardware.
And in truth, I can see that something nearly identical must also
exist in expand-down segments, as by a similar protocol.
Best regards,
Rick C. Hodgin
These days, PUSH and POP are deprecated because the runtime prefers
to maintain stacks at some larger alignment, 16 bytes I think for Win64.
Also a sequence of PUSH's or POP's each change the SP, causing
multiple renames and an unnecessary dependency chain.
The modern sequence would be more RISC-ish, using 1 subtract
and a sequence of indexed stores.
And of course, RISC doesn't need to embed a stack in an ISA at all.
Even for kernel mode interrupts a variation on branch-and-link makes
an architecture stack unnecessary, as I have described here previously.
So this as all for historical compatibility.
Eric
I have thought through a few. I may be missing something though.
I am considering here expand-down and expand-up operations using 32-bit
x86 architectural references for the example. I consider all stack
operations to be atomic, such that when something which alters the
stack begins it will complete without error. The only exception I see
to this would be a stack fault. In that case, in my view, the processor
should handle that condition in a special case because when the stack is
compromised for the required access, it can't be used for that access
and it must switch to another stack buffer designed for that error
condition.
-----
esp = 0xfc |........| 0xfc ... No data on stack
|........| 0xf8
|........| 0x04
esp = 0x00 |........| 0x00 ... No data on stack
-----
PUSH DWORD PTR 12345678h
(1) Decrement esp
(2) Write 0x12345678
_[top]_________
|........| 0xfc ... Not used
esp = 0xf8 ... |12345678| 0xf8 ... esp pointing to pushed value at 0xf8
|........| 0xf4
(1) Increment esp
(2) Write 0x12345678
|........| 0x08
esp = 0x04 ... |12345678| 0x04 --- esp pointing to pushed value at 0x04
_[top]_________|........| 0x00 ... Not used
-----
ENTER 4,0
(1) Decrement esp
(2) Copy esp to ebp
(3) Write old ebp
(4) Subtract 4 from esp for ENTER 4,0
_[top]_________
|........| 0xfc ... Not used
|12345678| 0xf8 ... Saved value from PUSH at [ebp+4]
|old ebp | 0xf4 ... Old ebp save value at [ebp]
esp = 0xf0 |xxxxxxxx| 0xf0 ... esp pointing to local variable [ebp-4]
(1) Increment esp
(2) Copy esp to ebp
(3) Write old ebp
(4) Add 4 to esp for ENTER 4,0
esp = 0x0c |xxxxxxxx| 0x0c ... esp pointing to local variable [ebp+4]
|old ebp | 0x08 ... Old ebp save value at [ebp]
|12345678| 0x04 ... Saved value from PUSH at [ebp-4]
_[top]_________|........| 0x00 ... Not used
-----
(1) Subtract esp,16
(2) Copy esp to edi
_[top]_________
|........| 0xfc ... Not used
|12345678| 0xf8 ... Saved value from PUSH at [ebp+4]
|old ebp | 0xf4 ... Old ebp save value at [ebp]
esp = 0xf0 |xxxxxxxx| 0xf0 ... esp pointing to local variable [ebp-4]
|bbbbbbbb| 0xec
|bbbbbbbb| 0xe8
|bbbbbbbb| 0xe4
esp = 0xe0 |bbbbbbbb| 0xf0 ... esp pointing to start of buffer
(1) Copy esp to edi
(2) Add esp,16
esp = 0x1c |bbbbbbbb| 0x1c ... esp pointing to end of buffer - 4
|bbbbbbbb| 0x18
|bbbbbbbb| 0x14
|bbbbbbbb| 0x10
|xxxxxxxx| 0x0c ... esp pointing to local variable [ebp+4]
|old ebp | 0x08 ... Old ebp save value at [ebp]
|12345678| 0x04 ... Saved value from PUSH at [ebp-4]
_[top]_________|........| 0x00 ... Not used
Best regards,
Rick C. Hodgin
I think you have missed Eric's point here. Modern x86 compilers (or
compilers on other large processors) don't use PUSH and POP, and
certainly not ENTER or LEAVE instructions. If the compiler needs space
on the stack for temporary variables (or passing arguments to functions
if there are too many for registers), it does not use PUSH. It
decrements the stack pointer in cache-aligned chunks, then accesses the
stack slots using [SP + offset] addressing modes. This often faster
(since the stack manipulation is typically done only once at function
entry), and keeps the stack cache-aligned for maximum efficiency. Frame
pointers ("BP" in x86 terminology) are only used when you have stack
usage that is not known at compile time - why go to the effort of
storing and calculating a frame pointer at run-time when the compiler
knows it at compile time?
Rick C. Hodgin
2015-07-22 14:29:19 UTC
Permalink
Post by David Brown
[snip]
I think you have missed Eric's point here.
Nope.

I don't care about highest efficiency, only reasonable efficiency, and
I place far greater emphasis on simplistic use for software developers,
and those who would create hardware for my CPU. I want to give them
easy connectivity and simplistic APIs that allow for rapid development
with little debugging.

The hardware itself is fast enough, and only getting faster (IBM's
recent announcement using "exotic" non-silicon-only substrates). And
my goals are to move to a sapphire substrate along the lines of those
used by Murata (Peregrine Semiconductor) and their Ultra-CMOS process.
On -2 and -3 generation processes, they're achieving switch speeds in
excess of 10 GHz in logic because there is no parasitic capacitance.

Once those technologies go out of patents (a few years away), it will
be my target. Lord willing by then I will have my design taped out
(at least in FPGA form).

Best regards,
Rick C. Hodgin
David Brown
2015-07-22 18:04:22 UTC
Permalink
Post by Rick C. Hodgin
Post by David Brown
[snip]
I think you have missed Eric's point here.
Nope.
I don't care about highest efficiency, only reasonable efficiency, and
I place far greater emphasis on simplistic use for software developers,
and those who would create hardware for my CPU. I want to give them
easy connectivity and simplistic APIs that allow for rapid development
with little debugging.
The hardware itself is fast enough, and only getting faster (IBM's
recent announcement using "exotic" non-silicon-only substrates). And
my goals are to move to a sapphire substrate along the lines of those
used by Murata (Peregrine Semiconductor) and their Ultra-CMOS process.
On -2 and -3 generation processes, they're achieving switch speeds in
excess of 10 GHz in logic because there is no parasitic capacitance.
Once those technologies go out of patents (a few years away), it will
be my target. Lord willing by then I will have my design taped out
(at least in FPGA form).
Realistically, you will never have enough money to be able to make a
design targeting these sorts of processes. To have any hope of getting
a new cpu popular enough to gain momentum of usage, aim to make it fit
on low-end FPGAs on cheap boards (i.e., Altera Cyclone or Xilinx Spartan
devices running at a maximum of a couple of hundred MHz).

That means keeping the design simple - which will also make it easier if
you ever get the chance to scale up to ASICs or dedicated silicon, as
well as being easier to do the design and get it running bug-free.

And it is always easier to think of things such as stack alignment
/now/, rather than as an afterthought - it is easy to add flexibility at
a later stage, but not additional restrictions.
Rick C. Hodgin
2015-07-22 18:07:26 UTC
Permalink
Post by David Brown
Post by Rick C. Hodgin
Post by David Brown
[snip]
I think you have missed Eric's point here.
Nope.
I don't care about highest efficiency, only reasonable efficiency, and
I place far greater emphasis on simplistic use for software developers,
and those who would create hardware for my CPU. I want to give them
easy connectivity and simplistic APIs that allow for rapid development
with little debugging.
The hardware itself is fast enough, and only getting faster (IBM's
recent announcement using "exotic" non-silicon-only substrates). And
my goals are to move to a sapphire substrate along the lines of those
used by Murata (Peregrine Semiconductor) and their Ultra-CMOS process.
On -2 and -3 generation processes, they're achieving switch speeds in
excess of 10 GHz in logic because there is no parasitic capacitance.
Once those technologies go out of patents (a few years away), it will
be my target. Lord willing by then I will have my design taped out
(at least in FPGA form).
Realistically, you will never have enough money to be able to make a
design targeting these sorts of processes. To have any hope of getting
a new cpu popular enough to gain momentum of usage, aim to make it fit
on low-end FPGAs on cheap boards (i.e., Altera Cyclone or Xilinx Spartan
devices running at a maximum of a couple of hundred MHz).
That means keeping the design simple - which will also make it easier if
you ever get the chance to scale up to ASICs or dedicated silicon, as
well as being easier to do the design and get it running bug-free.
And it is always easier to think of things such as stack alignment
/now/, rather than as an afterthought - it is easy to add flexibility at
a later stage, but not additional restrictions.
Things are only impossible until they are possible. The Lord is on my
side and I am giving my offering to Him. If it succeeds, it will be
because He has allowed it to succeed. If it fails, it will be because
it was not something He allowed. My work and efforts are unto Him, and
I am moving with all of the free time I have in my life toward these
ends. I am being greatly hampered by many life things, but the Lord
knows the sincerity with which I proceed. It will be His when it is
completed. And if not, then it will be the entirety of my life's work
since 2012 when I created Liberty Software Foundation, resolving to
work on these projects, offered unto Him. And offering the things you
do to Him, and to the other people in this world, it's not a bad way
to live one's life.

Best regards,
Rick C. Hodgin
Michael S
2015-07-22 17:25:15 UTC
Permalink
Post by David Brown
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.
snip
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Eric
This seems to be an architectural decision issue, rather than a real
one. A system designer could enforce in hardware a protocol which
makes SP point to the last entry on an expand-up segment, making the
protocol for writing a new value also be completely atomic so that
interrupts do not occur in the middle of stack writes, performing
the write and then increment in a like manner as it is today in
expand-down segments.
Well, it is a sort-of-architectural decision. Some of the
permutations of the listed items produce non-functional results.
Of the functional ones, on balance it seems to (marginally) have
the most optimal attributes.
I say sort-of-architectural because the decision pick one
approach and embed it in the architecture is not a functional necessity.
It is a historical fact that PDP-11 and 8080 (and others) chose to
embed a stack in their architecture. That decision comes because
a PUSH or POP instruction saves a byte here and there.
In those days, that can make the difference between a design win and loose.
Whatever method one chooses shouldn't require a non-interruptable sequence.
Or to put it another way, if one design requires a non-interruptable
sequence then I would view that as an unacceptable option.
One does have to be aware of the potential for interrupts
in the choice of design, even in user mode code.
Post by Rick C. Hodgin
I don't see this as a valid argument for expand-down segments, but
only one which would require those accommodations be made in hardware.
And in truth, I can see that something nearly identical must also
exist in expand-down segments, as by a similar protocol.
Best regards,
Rick C. Hodgin
These days, PUSH and POP are deprecated because the runtime prefers
to maintain stacks at some larger alignment, 16 bytes I think for Win64.
Also a sequence of PUSH's or POP's each change the SP, causing
multiple renames and an unnecessary dependency chain.
The modern sequence would be more RISC-ish, using 1 subtract
and a sequence of indexed stores.
And of course, RISC doesn't need to embed a stack in an ISA at all.
Even for kernel mode interrupts a variation on branch-and-link makes
an architecture stack unnecessary, as I have described here previously.
So this as all for historical compatibility.
Eric
I have thought through a few. I may be missing something though.
I am considering here expand-down and expand-up operations using 32-bit
x86 architectural references for the example. I consider all stack
operations to be atomic, such that when something which alters the
stack begins it will complete without error. The only exception I see
to this would be a stack fault. In that case, in my view, the processor
should handle that condition in a special case because when the stack is
compromised for the required access, it can't be used for that access
and it must switch to another stack buffer designed for that error
condition.
-----
esp = 0xfc |........| 0xfc ... No data on stack
|........| 0xf8
|........| 0x04
esp = 0x00 |........| 0x00 ... No data on stack
-----
PUSH DWORD PTR 12345678h
(1) Decrement esp
(2) Write 0x12345678
_[top]_________
|........| 0xfc ... Not used
esp = 0xf8 ... |12345678| 0xf8 ... esp pointing to pushed value at 0xf8
|........| 0xf4
(1) Increment esp
(2) Write 0x12345678
|........| 0x08
esp = 0x04 ... |12345678| 0x04 --- esp pointing to pushed value at 0x04
_[top]_________|........| 0x00 ... Not used
-----
ENTER 4,0
(1) Decrement esp
(2) Copy esp to ebp
(3) Write old ebp
(4) Subtract 4 from esp for ENTER 4,0
_[top]_________
|........| 0xfc ... Not used
|12345678| 0xf8 ... Saved value from PUSH at [ebp+4]
|old ebp | 0xf4 ... Old ebp save value at [ebp]
esp = 0xf0 |xxxxxxxx| 0xf0 ... esp pointing to local variable [ebp-4]
(1) Increment esp
(2) Copy esp to ebp
(3) Write old ebp
(4) Add 4 to esp for ENTER 4,0
esp = 0x0c |xxxxxxxx| 0x0c ... esp pointing to local variable [ebp+4]
|old ebp | 0x08 ... Old ebp save value at [ebp]
|12345678| 0x04 ... Saved value from PUSH at [ebp-4]
_[top]_________|........| 0x00 ... Not used
-----
(1) Subtract esp,16
(2) Copy esp to edi
_[top]_________
|........| 0xfc ... Not used
|12345678| 0xf8 ... Saved value from PUSH at [ebp+4]
|old ebp | 0xf4 ... Old ebp save value at [ebp]
esp = 0xf0 |xxxxxxxx| 0xf0 ... esp pointing to local variable [ebp-4]
|bbbbbbbb| 0xec
|bbbbbbbb| 0xe8
|bbbbbbbb| 0xe4
esp = 0xe0 |bbbbbbbb| 0xf0 ... esp pointing to start of buffer
(1) Copy esp to edi
(2) Add esp,16
esp = 0x1c |bbbbbbbb| 0x1c ... esp pointing to end of buffer - 4
|bbbbbbbb| 0x18
|bbbbbbbb| 0x14
|bbbbbbbb| 0x10
|xxxxxxxx| 0x0c ... esp pointing to local variable [ebp+4]
|old ebp | 0x08 ... Old ebp save value at [ebp]
|12345678| 0x04 ... Saved value from PUSH at [ebp-4]
_[top]_________|........| 0x00 ... Not used
Best regards,
Rick C. Hodgin
I think you have missed Eric's point here. Modern x86 compilers (or
compilers on other large processors) don't use PUSH and POP, and
certainly not ENTER or LEAVE instructions. If the compiler needs space
on the stack for temporary variables (or passing arguments to functions
if there are too many for registers), it does not use PUSH. It
decrements the stack pointer in cache-aligned chunks, then accesses the
stack slots using [SP + offset] addressing modes. This often faster
(since the stack manipulation is typically done only once at function
entry), and keeps the stack cache-aligned for maximum efficiency. Frame
pointers ("BP" in x86 terminology) are only used when you have stack
usage that is not known at compile time - why go to the effort of
storing and calculating a frame pointer at run-time when the compiler
knows it at compile time?
I think, modern x86 compilers do use PUSH and POP to save/restore caller-saved registers. Also I think that on modern big Intel cores it is actually slightly faster (due to denser code, you are unlikely to see a difference in microbenchmarks) than RISC-like sequence that you suggest. Not sure about modern big AMD cores.
Rob Warnock
2015-07-15 10:33:56 UTC
Permalink
Rick C. Hodgin <***@gmail.com> wrote:
+---------------
| EricP wrote:
| > All the others require either more arithmetic ops, or temp registers,
| > or leave a window where an interrupt could allow a value to be clobbered.
| >
| > For example, for a grow up stack, where SP points to
| > the next byte to write, you can't do a store then post increment
| > because an interrupt might clobber the value between those operations.
| > So you have to copy SP first, then adjust SP, then store,
| > and that takes an extra temp register.
|
| This seems to be an architectural decision issue, rather than a real
| one. A system designer could enforce in hardware a protocol which
| makes SP point to the last entry on an expand-up segment, making the
| protocol for writing a new value also be completely atomic so that
| interrupts do not occur in the middle of stack writes, performing
| the write and then increment in a like manner as it is today in
| expand-down segments.
+---------------

And that's exactly how the DEC PDP-10 up-growing stacks worked.
The hardware stack ops [PUSH/POP, PUSHJ/POPJ] and other ops
referencing the stack via indexing [e.g., MOVE T1,(P),
ADD T2,-3(P)] assumed that the stack pointer addressed the
last *valid* word of the stack, *not* the first as-yet-unwritten
word. Therefore EricP's concerns about interrupts were unfounded
on that machine [or any ISA in which the stack pointer points to
a valid stack position].

Note: The PDP-10's stack ops worked on any of the 16 general
registers, but after a certain point in time [roughly coinciding
with the "5-Series Monitor Coding Conventions"] most software
used register 15 (17 octal) as "the" stack, conventionally given
the symbolic name "P". [It was not required that user code use
the same register for the stack as the kernel, as system calls
did not refer to the stack per se.]


-Rob

-----
Rob Warnock <***@rpw3.org>
627 26th Avenue <http://rpw3.org/>
San Mateo, CA 94403
Michael S
2015-07-15 13:01:33 UTC
Permalink
Post by Rob Warnock
+---------------
| > All the others require either more arithmetic ops, or temp registers,
| > or leave a window where an interrupt could allow a value to be clobbered.
| >
| > For example, for a grow up stack, where SP points to
| > the next byte to write, you can't do a store then post increment
| > because an interrupt might clobber the value between those operations.
| > So you have to copy SP first, then adjust SP, then store,
| > and that takes an extra temp register.
|
| This seems to be an architectural decision issue, rather than a real
| one. A system designer could enforce in hardware a protocol which
| makes SP point to the last entry on an expand-up segment, making the
| protocol for writing a new value also be completely atomic so that
| interrupts do not occur in the middle of stack writes, performing
| the write and then increment in a like manner as it is today in
| expand-down segments.
+---------------
And that's exactly how the DEC PDP-10 up-growing stacks worked.
The hardware stack ops [PUSH/POP, PUSHJ/POPJ] and other ops
referencing the stack via indexing [e.g., MOVE T1,(P),
ADD T2,-3(P)] assumed that the stack pointer addressed the
last *valid* word of the stack, *not* the first as-yet-unwritten
word. Therefore EricP's concerns about interrupts were unfounded
on that machine [or any ISA in which the stack pointer points to
a valid stack position].
Note: The PDP-10's stack ops worked on any of the 16 general
registers, but after a certain point in time [roughly coinciding
with the "5-Series Monitor Coding Conventions"] most software
used register 15 (17 octal) as "the" stack, conventionally given
the symbolic name "P". [It was not required that user code use
the same register for the stack as the kernel, as system calls
did not refer to the stack per se.]
-Rob
-----
627 26th Avenue <http://rpw3.org/>
San Mateo, CA 94403
As pointed by EricP, the problem of push=pre-increment, pop=post-decrement scheme is that at point of push the processor shell somehow know the size of the item that is currently on top of the stack. The things get especially complicated when the "task" stack is utilized by [asynchronous] interrupts to save the context, but even with separate interrupts stack it's going to complicate compilers and calling conventions.
Without knowing all details, I'd assume that PDP-10 avoided the problem, because it was a word addressable machine and everything that matter had the same size = 1 word.
Rob Warnock
2015-07-16 15:12:06 UTC
Permalink
[Sorry for the long ancient history trivia.
Probably should have gone on a.f.c. ;-} ]

Michael S <***@yahoo.com> wrote:
+---------------
| Rob Warnock wrote:
| > And that's exactly how the DEC PDP-10 up-growing stacks worked.
| > The hardware stack ops [PUSH/POP, PUSHJ/POPJ] and other ops
| > referencing the stack via indexing [e.g., MOVE T1,(P),
| > ADD T2,-3(P)] assumed that the stack pointer addressed the
| > last *valid* word of the stack, *not* the first as-yet-unwritten
| > word. Therefore EricP's concerns about interrupts were unfounded
| > on that machine [or any ISA in which the stack pointer points to
| > a valid stack position].
...
| As pointed by EricP, the problem of push=pre-increment, pop=post-decrement
| scheme is that at point of push the processor shell somehow know the size
| of the item that is currently on top of the stack. The things get especially
| complicated when the "task" stack is utilized by [asynchronous] interrupts
| to save the context, but even with separate interrupts stack it's going to
| complicate compilers and calling conventions. Without knowing all details,
| I'd assume that PDP-10 avoided the problem, because it was a word addressable
| machine and everything that matter had the same size = 1 word.
+---------------

Yes, the PDP-10 was a word addressable machine; and, yes, the
hardware stack ops PUSH, POP, PUSHJ, & POP only pushed or popped
one word onto or off of the stack. But no, it was quite common
to have objects larger than a single word on the stack. There
was no problem of "somehow knowing the size of the item that
[was] currently on top of the stack", since the same entity
that pushed a multi-word object onto the stack [be it a compiler
or a person writing assembler (very common in those days] was
responsible for removing it. Discarding a group of objects
from the top of the stack was as simple as executing a single
"SUB P,[n,,n]", where "n" is the sum of the sizes in words.

Oh, and said items would have been pushed onto the stack in
the first place by either PUSH-ing the individual words or
doing an "ADD P,[n,,n]" to reserve the space and then doing
a sequences of stores or a BLT [BLock Transfer] to copy the
object(s) onto the stack. Same as x86 or PDP-11, really, only
in the opposite direction.

Sometimes frame pointers were used, with negative indexing
for args and positive indexing for locals, and sometimes only
zero- and negative-indexing off the stack. What was never done
was accessing memory that hadn't already been "covered" by a
stack pointer adjustment. [PUSH/PUSHJ adjusted the stack first,
*then* copied data; the reverse for POP/POPJ.]

And, yes, the PDP-10 avoided the problem of hardware pushing
interrupt save context onto the system stack by not *having*
stack-based interrupts per se!! ;-} Instead, an interrupt
"executed" an instruction in a per-interrupt-level memory
location, which (usually) contained a "JSR" instuction which
stored the return address and flags into a per-interrupt-level
service routine and then jumped to the following location.
[Yes, somewhat "self-modifying" code, but "JSR" was rarely used
other than in this context.] The per-interrupt-level service
routine used a "skip chain" of I/O instructions to find out
which device had interrupted on that level, but eventually
some device service routine would save all the user registers
in a per-interrupt-level save areas [or maybe a per-process
data block, I forget] and then set up its own stack to process
the interrupt.

[I'm ignoring the case of a BLKI/BLKO instruction in an interrupt
location, as the "transfer complete" actions for BLKI/BLKO generally
follow the above outline anyway.]

There were seven interrupt priority levels. Interrupts at a given
level were not re-entrant, so "only" seven static per-interrupt-level
save areas and stacks were needed. [Per-processor, that is. The
PDP-10 system supported SMP with up to six(?) CPUs.]


-Rob

-----
Rob Warnock <***@rpw3.org>
627 26th Avenue <http://rpw3.org/>
San Mateo, CA 94403
EricP
2015-07-16 17:04:04 UTC
Permalink
Post by Rob Warnock
+---------------
| > All the others require either more arithmetic ops, or temp registers,
| > or leave a window where an interrupt could allow a value to be clobbered.
| >
| > For example, for a grow up stack, where SP points to
| > the next byte to write, you can't do a store then post increment
| > because an interrupt might clobber the value between those operations.
| > So you have to copy SP first, then adjust SP, then store,
| > and that takes an extra temp register.
And that's exactly how the DEC PDP-10 up-growing stacks worked.
The hardware stack ops [PUSH/POP, PUSHJ/POPJ] and other ops
referencing the stack via indexing [e.g., MOVE T1,(P),
ADD T2,-3(P)] assumed that the stack pointer addressed the
last *valid* word of the stack, *not* the first as-yet-unwritten
word. Therefore EricP's concerns about interrupts were unfounded
on that machine [or any ISA in which the stack pointer points to
a valid stack position].
Interrupts are not a concern there because you are doing a
single word operation on a word oriented cpu (no bytes),
and doing it the right way: adjust pointer THEN store.

What I said was that stack sequences, other than grown down
with SP points to first byte of last PUSHed value,
require either a temp register, or extra ALU operations,
or would leave a window for interrupts to clobber the value.

In those days, everything cost cycles. Accessing memory using a
register address cost X, indexing off that same register cost X+Y.
First it had to fetch the index, often an extra memory cycle.
Then move the index and address to the ALU (cycles depends on number
of internal buses and their width), an ADD operation (depends
on the width of ALU - e.g. one 16-bit DG Nova has 4-bit ALU),
and a move to MAR.

So on a 16 bit cpu, to push a 2 byte word onto a grow up
stack where SP points to the last valid byte pushed:
SP = SP + 2
tmp = SP - 1
store (tmp, src, 2)

requires an extra subtract operation to point to the word start
that a grow down stack does not. That extra ALU op could cost 1 or 2
extra cycles, or a dozen, depending on the number of internal buses,
their widths, and ALU width.

Eric
Michael S
2015-07-15 22:11:00 UTC
Permalink
Post by EricP
Post by Rick C. Hodgin
In a protected architecture design like i386 and later, are there
any advantages to using an expand-down stack compared to an
expand-up stack?
Why would someone choose one design over another?
A stack can either grow down or up, adjusting the SP for object size
before or after the store/load, pointing to the object start or end byte,
pointing to the last object written or next object to write.
When you work through the permutations of the above,
taking into account various object sizes and possible
interrupts that might allow an overwrite of a stack value,
I believe you'll find that grow down with pre-decrement-then-store
requires 1 arithmetic op, no temp registers, is interrupt safe,
and leaves SP pointing to the start of the object.
SP = SP - size
store (SP, src, size)
load (SP, dst, size)
SP = SP + size
Also SP points to the first byte of the object and
can be used directly to access that object without
further indexing off SP which would require more cycles.
As a general observation it's all fine and dandy, but does not applies to original (i.e. 16-bit) x86. 16-bit x86 can't use SP as "normal" address register. Too few bits in ModRM and until x386 they had no Sib. To use SP as an address you had to do PUSH/POP/CALL/RET etc...
Post by EricP
All the others require either more arithmetic ops, or temp registers,
or leave a window where an interrupt could allow a value to be clobbered.
For example, for a grow up stack, where SP points to
the next byte to write, you can't do a store then post increment
because an interrupt might clobber the value between those operations.
So you have to copy SP first, then adjust SP, then store,
and that takes an extra temp register.
tmp = SP
SP = SP + size
store (tmp, src, size)
So they are all doable, but grow down with SP pointing to the
first byte of the last object written has the nicest attributes.
Eric
EricP
2015-07-16 16:42:19 UTC
Permalink
Post by Michael S
As a general observation it's all fine and dandy, but does not applies to original (i.e. 16-bit) x86. 16-bit x86 can't use SP as "normal" address register. Too few bits in ModRM and until x386 they had no Sib. To use SP as an address you had to do PUSH/POP/CALL/RET etc...
I never used 186 or 286 (skipped them) however
I assume they were backwards 8086 compatible.
On the 8086 you were supposed to use BP for accessing the stack,
and BP used the stack segment as SP did, whereas BX, SI and DI
used data segment. e.g. intended use

push BP ; routine entry
mov BP, SP
...
mov reg, [BP+8] ; get stack arg
...
mov BX, SP ; get SP
mov reg, SS:[BX+offset]
...
pop BP ; routine exit
ret n

Eric
BGB
2015-07-16 22:24:41 UTC
Permalink
Post by EricP
Post by Michael S
As a general observation it's all fine and dandy, but does not applies
to original (i.e. 16-bit) x86. 16-bit x86 can't use SP as "normal"
address register. Too few bits in ModRM and until x386 they had no
Sib. To use SP as an address you had to do PUSH/POP/CALL/RET etc...
I never used 186 or 286 (skipped them) however
I assume they were backwards 8086 compatible.
On the 8086 you were supposed to use BP for accessing the stack,
and BP used the stack segment as SP did, whereas BX, SI and DI
used data segment. e.g. intended use
push BP ; routine entry
mov BP, SP
...
mov reg, [BP+8] ; get stack arg
...
mov BX, SP ; get SP
mov reg, SS:[BX+offset]
...
pop BP ; routine exit
ret n
yeah, 286 was basically just like the 8086, with the main difference
being that segments could point anywhere within a 24-bit address space.

generally, there was the GDT, which was intended for the OS to use for
segments, and the LDT that was IIRC intended mostly for
application-level segments (such as pointing to allocated memory regions
or large arrays).

most of the memory access was done via far pointers.


granted, I didn't really do that much development-wise on Win16, as I
was pretty young and new to programming at the time, and 16-bit was
basically on the way out at the time (after Win 3.11 and some continued
development mostly on MS-DOS and Win95, jumped ship mostly to Linux for
a few years, and migrated back into the Windows fold mostly using NT4
and 2K).

( Windows XP not sucking and Linux's continued issues with
sound/network/GPU issues at the time were enough basically to send me
back to using Windows a primary OS (for a while dual-booting, but VMware
in-turn killed the need to dual-boot). my secondary computers still
mostly run Linux though, and Windows hasn't done enough seriously
grievous to send me over the edge. yeah, Win8 sucks, but I am using
Win7, and had also largely skipped over Vista in favor of WinXP. )


IIRC, DI was often used against ES, but it depended some on the
instruction whether it was DS:DI or ES:DI (most ops would use DS, but
MOVS and friends would use ES).

typically, code would maintain ES=DS to help gloss over this issue.

but, yeah, BP was special in the sense that it defaulted to SS.

32-bit code made EBP partly redundant with ESP, but EBP remained fairly
common as the de-facto pointer for addressing function-arguments and
local variables, and for function backtraces.

then some compilers started omitting its use as a frame-pointer to free
up the register. similarly, its use is a bit hit-or-miss on x86-64 (even
as convenient it would be to keep it for if no other reason, to allow
more reliably back-tracking the call stack).
Rick C. Hodgin
2015-07-21 13:43:43 UTC
Permalink
I can't find any particularly overriding reasons to have one
architecture over the other, apart from the fact that since neither
seems to be of significant advantage, the arbitrary choice to only
use one reduces design complexity somewhat.

I believe I'll support both in my architecture. When the selector
associated with the stack is marked expand-down, it will operate
that way. When it is expand up, it will operate that way.

Best regards,
Rick C. Hodgin
Rick C. Hodgin
2015-07-23 01:37:36 UTC
Permalink
By the way, I have referred to the DNA Project Butterfly 40-bit CPU as
"the butterfly CPU." :-)

Wouldn't it be interesting if the example of the life cycle we see in
the butterfly is a preview of that which ultimately happens to the data
we have to date collected on DNA. For example, at first a caterpillar
comes out and all it does is eat and eat and grow and grow. Once it
gets to a certain size, it enters a phase of change. When it emerges
having changed, then this amazingly beautiful thing exists and goes
off to reproduce.

What if the information we've been gathering scientifically for all
these years is like the caterpillar stage, and we are about to go
into a phase of change whereby we consider that God really does exist,
and maybe these things are real, and when we emerge changed from that
contemplation, then this amazingly beautiful thing will exist, which
sends out a message from God to man, spawning copies all across the
Internet, across the world, in a language which everybody can
understand because God set it up that way.

It would be truly amazing. Truly, truly amazing.

Best regards,
Rick C. Hodgin
Loading...