Discussion:
[fpc-devel] Re. z370 Cross Compilation, Pass 2 of ....
Mark Morgan Lloyd
2013-08-20 14:54:44 UTC
Permalink
Apologies that I've broken the thread, but some messages don't get
through our gateway- for some reason I'm seeing the message I'm quoting
arriving as an encoded attachment.

[Sven said]
If you'd now only explain what a cross-reference tool is I might even
understand what you're trying to tell me here...
I think what he means is that he wants a tool that will annotate each
procedure/function/method call in the source with the file and line
number that it transfers control to. That's obviously going to be a
problem in a language like Object Pascal which supports virtual methods,
where the actual target isn't known until execution time.
Otherwise we also rely on external tools (mostly the GNU linker)
here. So as a first step you'd choose the approach of using an
external assembler and linker, because simply calling a third party
utility is easier than completely implementing an internal assembler
and linker.
With the caveat here that as I understand it experienced IBM programmers
avoid the GNU assembler like the plague, since it doesn't have anything
like the sort of macro facilities they're used to. By implication, that
would imply that they prefer to avoid the GNU linker and related tools
as well.
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system without
a stack as understood today. Parameters are mostly passed in registers,
but this is largely hidden since supervisor calls etc. are usually
hidden in macros.

My own feeling is that it would be best to start targeting a late-model
390, which does have a stack etc., and to use the standard GNU assembler
and linker (as and ld) /initially/ targeting Linux. Any other
combination (i.e. a proprietary assembler etc. with an antique MVS as
target) is going to cause nothing but grief, since it makes it very
difficult for developers skilled with FPC but not with IBM mainframes to
give any practical help.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Sven Barth
2013-08-20 18:15:55 UTC
Permalink
Post by Mark Morgan Lloyd
Apologies that I've broken the thread, but some messages don't get
through our gateway- for some reason I'm seeing the message I'm quoting
arriving as an encoded attachment.
[Sven said]
If you'd now only explain what a cross-reference tool is I might even
understand what you're trying to tell me here...
I think what he means is that he wants a tool that will annotate each
procedure/function/method call in the source with the file and line
number that it transfers control to. That's obviously going to be a
problem in a language like Object Pascal which supports virtual methods,
where the actual target isn't known until execution time.
Ah ok. In that case that indeed not a good idea to do especially with
the compiler of which the code generator relies heavily on virtual
methods (and exactly this part is what Paul is interested in). For the
parser it might work however, but there you could also add a
"Writeln({$I %file%}, ' ', {$I %line%});" at the beginning of each
function :P
Post by Mark Morgan Lloyd
Otherwise we also rely on external tools (mostly the GNU linker)
here. So as a first step you'd choose the approach of using an
external assembler and linker, because simply calling a third party
utility is easier than completely implementing an internal assembler
and linker.
With the caveat here that as I understand it experienced IBM programmers
avoid the GNU assembler like the plague, since it doesn't have anything
like the sort of macro facilities they're used to. By implication, that
would imply that they prefer to avoid the GNU linker and related tools
as well.
A compiler is something different than a "experienced IBM programmer".
If a programmer who writes assembly code directly tries to shortcut
things through macros that is one thing, but for a compiler it doesn't
matter that much whether it can make use of the macros or not. Let's
assume there is a assembler macro for the System.Move functionality. A
IBM programmer might prefer to use that, while the compiler has no
problem with translating the Pascal implementation of System.Move to
"low level" s370.
Post by Mark Morgan Lloyd
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system without
a stack as understood today. Parameters are mostly passed in registers,
but this is largely hidden since supervisor calls etc. are usually
hidden in macros.
My own feeling is that it would be best to start targeting a late-model
390, which does have a stack etc., and to use the standard GNU assembler
and linker (as and ld) /initially/ targeting Linux. Any other
combination (i.e. a proprietary assembler etc. with an antique MVS as
target) is going to cause nothing but grief, since it makes it very
difficult for developers skilled with FPC but not with IBM mainframes to
give any practical help.
Steve
2013-08-21 20:17:39 UTC
Permalink
Post by Mark Morgan Lloyd
Otherwise we also rely on external tools (mostly the GNU linker)
here. So as a first step you'd choose the approach of using an
external assembler and linker, because simply calling a third party
utility is easier than completely implementing an internal assembler
and linker.
With the caveat here that as I understand it experienced IBM programmers
avoid the GNU assembler like the plague, since it doesn't have anything
like the sort of macro facilities they're used to. By implication, that
would imply that they prefer to avoid the GNU linker and related tools
as well.
There is a problem inherent in this discussion; zArch is not one environment!
It's one architecture supporting multiple operating systems, much like the
386/486/586/686 etc supports Linux or DOS or OS/2 or Windows (all 37 versions)
etc. zArch has MVS (in all it's varieties) VM, DOS, LINUX, MFT, MVT, MUSIC/SP,
and loads of other more niche stuff. In addition to all this, later versions of
MVS supply a POSIX compliant shell called OMVS. GNU anything is available in
hardly any of these environments even if we can handle the brain-dead assembler.
Post by Mark Morgan Lloyd
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system without
a stack as understood today. Parameters are mostly passed in registers,
but this is largely hidden since supervisor calls etc. are usually
hidden in macros.
I am an MVS person so I can't speak for the other lot but parameter passing
is mostly done in storage. The standard linkage conventions used allow for
two 32-bit signed or unsigned integers (64-bit in later models). Anything
else is passed in a storage area pointed to by register 1. Where this storage
area comes from is complex and variable. My guess that the other IBM systems
have similar models. A further guess is that Linux based code allocates an
area of storage, points a register at it, write a couple of mickey mouse
macros and bingo, a stack.
Post by Mark Morgan Lloyd
My own feeling is that it would be best to start targeting a late-model
390, which does have a stack etc., and to use the standard GNU assembler
and linker (as and ld)/initially/ targeting Linux. Any other
combination (i.e. a proprietary assembler etc. with an antique MVS as
target) is going to cause nothing but grief, since it makes it very
difficult for developers skilled with FPC but not with IBM mainframes to
give any practical help.
Late-model 390's have a stack, but not as you know it. It's not something
you can go around lobbing arbitrary data at. It is reserved for data saved
during subroutine linkage using the appropriate hardware instruction (Branch
and Stack). This includes various register sets, PSW status info etc) and an
additional two 32-bit signed or unsigned integers (64-bit in later models).

Steve
Mark Morgan Lloyd
2013-08-21 21:44:45 UTC
Permalink
Post by Steve
Post by Mark Morgan Lloyd
Otherwise we also rely on external tools (mostly the GNU linker)
here. So as a first step you'd choose the approach of using an
external assembler and linker, because simply calling a third party
utility is easier than completely implementing an internal assembler
and linker.
With the caveat here that as I understand it experienced IBM programmers
avoid the GNU assembler like the plague, since it doesn't have anything
like the sort of macro facilities they're used to. By implication, that
would imply that they prefer to avoid the GNU linker and related tools
as well.
There is a problem inherent in this discussion; zArch is not one environment!
It's one architecture supporting multiple operating systems, much like the
386/486/586/686 etc supports Linux or DOS or OS/2 or Windows (all 37 versions)
etc. zArch has MVS (in all it's varieties) VM, DOS, LINUX, MFT, MVT, MUSIC/SP,
and loads of other more niche stuff. In addition to all this, later versions of
MVS supply a POSIX compliant shell called OMVS. GNU anything is available in
hardly any of these environments even if we can handle the brain-dead assembler.
GCC plus the basic utilities are definitely ported to VM/370 and
MUSIC/SP (i.e. I've run them), and as far as I know to the others. In
all cases possibly subject to memory restrictions, i.e. might need more
than the standard 16Mb supplied by older operating systems for large
jobs (that's where the /380 patch to Hercules comes into it).
Post by Steve
Post by Mark Morgan Lloyd
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system without
a stack as understood today. Parameters are mostly passed in registers,
but this is largely hidden since supervisor calls etc. are usually
hidden in macros.
I am an MVS person so I can't speak for the other lot but parameter passing
is mostly done in storage. The standard linkage conventions used allow for
two 32-bit signed or unsigned integers (64-bit in later models). Anything
else is passed in a storage area pointed to by register 1. Where this storage
area comes from is complex and variable. My guess that the other IBM systems
have similar models. A further guess is that Linux based code allocates an
area of storage, points a register at it, write a couple of mickey mouse
macros and bingo, a stack.
Don't guess, see http://wiki.lazarus.freepascal.org/ZSeries which apart
from anything else has links to the original GCC implementation notes
which in part determine what architecture variants Linux supports.
However there also appear to be later ports which support older CPU
variants, see link below.
Post by Steve
Post by Mark Morgan Lloyd
My own feeling is that it would be best to start targeting a late-model
390, which does have a stack etc., and to use the standard GNU assembler
and linker (as and ld)/initially/ targeting Linux. Any other
combination (i.e. a proprietary assembler etc. with an antique MVS as
target) is going to cause nothing but grief, since it makes it very
difficult for developers skilled with FPC but not with IBM mainframes to
give any practical help.
Late-model 390's have a stack, but not as you know it. It's not something
you can go around lobbing arbitrary data at. It is reserved for data saved
during subroutine linkage using the appropriate hardware instruction (Branch
and Stack). This includes various register sets, PSW status info etc) and an
additional two 32-bit signed or unsigned integers (64-bit in later models).
The examples at
http://wiki.lazarus.freepascal.org/Assembler_and_ABI_Resources#zSeries_.28S.2F390.29
show a significant simplification of the code when a newer architecture
is specified.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Bernd Oppolzer
2013-09-01 03:41:37 UTC
Permalink
Post by Steve
Post by Mark Morgan Lloyd
Otherwise we also rely on external tools (mostly the GNU linker)
here. So as a first step you'd choose the approach of using an
external assembler and linker, because simply calling a third party
utility is easier than completely implementing an internal assembler
and linker.
With the caveat here that as I understand it experienced IBM programmers
avoid the GNU assembler like the plague, since it doesn't have anything
like the sort of macro facilities they're used to. By implication, that
would imply that they prefer to avoid the GNU linker and related tools
as well.
There is a problem inherent in this discussion; zArch is not one environment!
It's one architecture supporting multiple operating systems, much like the
386/486/586/686 etc supports Linux or DOS or OS/2 or Windows (all 37 versions)
etc. zArch has MVS (in all it's varieties) VM, DOS, LINUX, MFT, MVT, MUSIC/SP,
and loads of other more niche stuff. In addition to all this, later versions of
MVS supply a POSIX compliant shell called OMVS. GNU anything is available in
hardly any of these environments even if we can handle the brain-dead assembler.
This is correct, but it should be possible to cover most of those
systems with
one RTL - much the same way as LE (language environment) does it in the
"big" IBM world, too.
Post by Steve
Post by Mark Morgan Lloyd
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system without
a stack as understood today. Parameters are mostly passed in registers,
but this is largely hidden since supervisor calls etc. are usually
hidden in macros.
I am an MVS person so I can't speak for the other lot but parameter passing
is mostly done in storage. The standard linkage conventions used allow for
two 32-bit signed or unsigned integers (64-bit in later models). Anything
else is passed in a storage area pointed to by register 1. Where this storage
area comes from is complex and variable. My guess that the other IBM systems
have similar models. A further guess is that Linux based code
allocates an
area of storage, points a register at it, write a couple of mickey mouse
macros and bingo, a stack.
Ok. So what? This makes parameter passing just easier, because you don't
have to deal with special cases for - say - low number of parameters.
It's all the
same method. And Pascal has call by value and call by reference and -
for example -
not the strange descriptor and dope vector things that PL/1 has. The
stack is simulated
by simply incrementing and decrementing the stack pointer on procedure
entry and
return and addressing relatively to it. At least that is what is done in
the "classical"
compilers. I recently learned a bit about the MIPS architecture, and it
seemed to
me to have some similarities to IBM's, so I guess, a kind of
"translation" from MIPS
to IBM could be successful. Many people say that the IBM platform is the
only
remaining platform today that reasonably can be programmed using
assembly language,
because of the clear structure of the instruction set.
Post by Steve
Post by Mark Morgan Lloyd
My own feeling is that it would be best to start targeting a late-model
390, which does have a stack etc., and to use the standard GNU assembler
and linker (as and ld)/initially/ targeting Linux. Any other
combination (i.e. a proprietary assembler etc. with an antique MVS as
target) is going to cause nothing but grief, since it makes it very
difficult for developers skilled with FPC but not with IBM mainframes to
give any practical help.
Late-model 390's have a stack, but not as you know it. It's not something
you can go around lobbing arbitrary data at. It is reserved for data saved
during subroutine linkage using the appropriate hardware instruction (Branch
and Stack). This includes various register sets, PSW status info etc) and an
additional two 32-bit signed or unsigned integers (64-bit in later models).
Steve
I would strongly suggest not to use the "new" hardware stack, but
instead to
rely on the OS linkage conventions (save area chaining, register 13), which
in fact are used by most of the operating systems mentioned above - with
minor differences.
Mark Morgan Lloyd
2013-09-01 09:01:51 UTC
Permalink
Post by Bernd Oppolzer
Post by Steve
Post by Mark Morgan Lloyd
Otherwise we also rely on external tools (mostly the GNU linker)
here. So as a first step you'd choose the approach of using an
external assembler and linker, because simply calling a third party
utility is easier than completely implementing an internal assembler
and linker.
With the caveat here that as I understand it experienced IBM programmers
avoid the GNU assembler like the plague, since it doesn't have anything
like the sort of macro facilities they're used to. By implication, that
would imply that they prefer to avoid the GNU linker and related tools
as well.
There is a problem inherent in this discussion; zArch is not one environment!
It's one architecture supporting multiple operating systems, much like the
386/486/586/686 etc supports Linux or DOS or OS/2 or Windows (all 37 versions)
etc. zArch has MVS (in all it's varieties) VM, DOS, LINUX, MFT, MVT, MUSIC/SP,
and loads of other more niche stuff. In addition to all this, later versions of
MVS supply a POSIX compliant shell called OMVS. GNU anything is available in
hardly any of these environments even if we can handle the brain-dead assembler.
This is correct, but it should be possible to cover most of those
systems with
one RTL - much the same way as LE (language environment) does it in the
"big" IBM world, too.
I suspect that targeting "classic" OSes which use EBCDIC would need a
branch of the RTL, even if the sourcecode in that branch were in ASCII.
I also suspect that- however attractive this is- trying to port the
compiler so that it would self-host on EBCDIC-based OSes would
effectively fork the project, unless somebody could come up with a neat
hack to automatically convert patches back to ASCII when they're sent to
the Subversion repository.

Thought for everybody even remotely interested: if we could standardise
on the first line of every unit being a non-essential comment (and
specifically, not a compiler directive) then it would be possible for
the parser to decide whether it were looking at ASCII or EBCDIC before
hitting anything important.
Post by Bernd Oppolzer
compilers. I recently learned a bit about the MIPS architecture, and it
seemed to
me to have some similarities to IBM's, so I guess, a kind of
"translation" from MIPS
to IBM could be successful.
A postcompilation translation probably wouldn't work since e.g. there
are different ranges of parameters used for compiler passing, but I was
wondering whether the MIPS directory could be moved en masse. But again,
note Florian's warning about this.
Post by Bernd Oppolzer
to IBM could be successful. Many people say that the IBM platform is the
only
remaining platform today that reasonably can be programmed using
assembly language,
because of the clear structure of the instruction set.
In fairness, the x86 probably falls into that category as well. However
I think most people abandoned x86 assembler when both Linux and the NT
family of OSes demonstrated that C was adequate for portable device
drivers etc.
Post by Bernd Oppolzer
I would strongly suggest not to use the "new" hardware stack, but
instead to
rely on the OS linkage conventions (save area chaining, register 13), which
in fact are used by most of the operating systems mentioned above - with
minor differences.
Noted, but I think we have to go along with Linux practice for the
initial port.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Steve
2013-08-24 18:50:05 UTC
Permalink
First, let me apologise for that last email. The subject and the formatting
was awful. I blame my eMail client, not the fact that I am incompetent. The
latter is a calumny that I will defend to the utter limits of the law. :) At
least it wasn't HTML.

(Brain the size of a planet but thunderbird just totally baffles me I'm
afraid)

To business...
If 360 assembly code can be used on modern processor variants as well I
see no problem with targeting that at first only.
Modern processors provide new instructions that would certainly make life
much easier, but IBM put a load of effort into keeping forward compatibility
intact. This is true even at object code level. I have worked in
environments that lifted Fortran compilers that were 25 years old and run
them successfully on modern OSes without recompilation. The only fly in the
ointment is Linux (isn't it always the case!). IBM do not supply an
assembler for Linux. They use gas. But there are "proper" Assemblers available.
The point with using Linux as a first target is that you would not need
to implement a new RTL, but only the code generator and the processor
specific RTL code.
Hmmm! I think it will be more complex than that... Certainly, anything
related to I/O will need to be drastically rethought, even before it's
rewritten. The I/O architecture is just too different.
The point for gas/ld was simply that we have existing writers for these
two, but writing your own writers for IBM specific tools isn't rocket
science either... But it's another thing you'd need to implement.
You don't have existing writers! They may be familiar with the basic syntax
that gas/ld use, but IBM assembler, gas, Assembler F or High Level Assembler
has a totally different vocabulary. The semantics are also unique. CALL
exists but is different, RET doesn't exist. JMP and it's variants are called
B (B for Branch) and the variants are different. What is the difference
between L and LA? What are L and LA? MOV doesn't exist, there a whole host
of different instructions for moving data depending upon source, destination
and format. Do they understand what the 16M line and the 2G bar are and
their implications for code? And a whole other list longer than my arm. The
manual that defines the Assembler language is, at last count, 1292 pages of
A4! All of it different from what your pool of gas coders is familiar with,
unless they have previously written IBM Assembler, in which case they
probably used Assembler F or one of it's successors.

And it's not just the assembly language that's different, the target
hardware bears absolutely no relation to what they are used to apart from
the fact that it uses 8-bit bytes; The I/O subsystem is radically different,
even the character set is different.
If we use gas/ld we put ourselves in the same situation Delphi landed
us in with the use of non-standard libraries; Writing individual
wrapper functions for every other function under the sun.
Would you elaborate what you mean here?
Sorry, isn't this a universal problem, or is it just me? Anyway.. If I am
writing code in Delphi and I want to use, say, MySQL; I have a problem;
Delphi provides no mechanism to call MySQL. If I am a C programmer, I just
include the .h file and MySQL is available. Instead, I have to troll round
the internet looking for a component or set of components to provide this
support. Often especially in the early years of Delphi, there wasn't one. So
I have to write my own. I sit down with the manual and the MySQL.h header
file and reverse engineer the relevant code from it's C equivalent. I bought
Delphi so I wouldn't have to mess around with C. I hate C. I consign it the
same hell that 380 belongs in... No, I consign it to the lowest levels of
hell. (Deep breath).

But everything is defined in terms of C and, ultimately has to be reverse
engineered into a Pascal equivalent. Wouldn't it be nice if I could say
"Uses MySQL;" and away it goes. If it can't find a Delphi unit it looks for
MySQL.lib or MySQL.h or whatever and generates the code for me.

I'm sure you're aware of the process, and, yes, I am aware of the problems,
but you asked me to elaborate.

The same thing exists in MVS, and the other OSes too but I am an MVS man so
I will stick to that. If I want to link to a system facility or another
product there are a number of different ways it can be achieved.

1) Use a macro. All base OS facilities are defined by Macro interfaces. Some
of the actual binary interfaces are documented. Some are not. Without a
varying amount of reverse engineering, gas won't allow us to use them. You
called them a "shortcut" in another post. Sometimes they are; Sometimes they
are the only sensible way to go because the long way round is far too
complex. To take 1 example; Memory management in MVS has 4 or 5 (I think)
Supervisor Calls to allocate and deallocate a block of memory. Which one is
required depends on the parameters passed to the macro, it's is not a
trivial task. Gas doesn't allow macros.

2) Use CALL to link to an interface module. In these cases the CALL
interface is documented, but the actual code to link to the product is
complex. It may involve Authorisation changes or Address Space switches or
finding addresses of Object Code only storage areas. The CALLed code is
linked automatically by the linkage editor. Ld can't read MVS load libraries.

3) DB2 and CICS, along with IMS these are the war-horses of MVS application
development. They use a pre-processor. The assembler source code is fed into
a pre-processor, it generates the required assembler and the output is fed
into the assembler. The pre-processors don't understand gas assembler,
except, possibly, for the latest and greatest versions. These interfaces
are complex command-oriented ones and the assembler equivalents
are not documented.

Elaborate enough? :)

Regards,
Steve
Hans-Peter Diettrich
2013-08-24 19:40:15 UTC
Permalink
Post by Steve
If we use gas/ld we put ourselves in the same situation Delphi landed
us in with the use of non-standard libraries; Writing individual
wrapper functions for every other function under the sun.
Would you elaborate what you mean here?
Sorry, isn't this a universal problem, or is it just me? Anyway.. If I am
writing code in Delphi and I want to use, say, MySQL; I have a problem;
Delphi provides no mechanism to call MySQL. If I am a C programmer, I just
include the .h file and MySQL is available.
This is not really true. I have an C-to-Pascal translator, that
translates the header files perfectly. Well, some constructs (with
unions) cannot be translated as they are declared, but these are rare
and do not affect API calls, only subsequent access to the struct fields.

But that translator requires the right header files of the compiler,
used to translate the library, compiler settings etc. If you didn't
compile the library on your system, even a C compiler may fail to
properly connect to the library ABI. This is not a big problem on
systems with only one C compiler and library, but how many C compilers
exist for Windows? Even worse with C++, where no compiler produces
binaries usable with a different C++ compiler.

In fact my translator can convert C code of many compilers into OPL,
where some compilers require very special translation. But configuring
it is so complicated, that only few people ever used it. Most are happy
with h2pas or similar tools, which only deal with header files and can
produce unusable results in certain cases. You have the choice...

DoDi
Mark Morgan Lloyd
2013-08-24 19:44:38 UTC
Permalink
Post by Steve
The point with using Linux as a first target is that you would not need
to implement a new RTL, but only the code generator and the processor
specific RTL code.
Hmmm! I think it will be more complex than that... Certainly, anything
related to I/O will need to be drastically rethought, even before it's
rewritten. The I/O architecture is just too different.
It's not. Linux on 390 (caveat: my experience is on Hercules, but I have
no reason to believe that the distro I was using had been tailored for
emulation) looks like Linux on SPARC, Linux on MIPS, Linux on PPC... and
so on. The only significant departure is that they don't see Ethernet
hardware, but instead talk SLIP to a gateway provided by Hercules or use
some sort of offload engine.
Post by Steve
The point for gas/ld was simply that we have existing writers for these
two, but writing your own writers for IBM specific tools isn't rocket
science either... But it's another thing you'd need to implement.
You don't have existing writers! They may be familiar with the basic syntax
that gas/ld use, but IBM assembler, gas, Assembler F or High Level Assembler
has a totally different vocabulary. The semantics are also unique. CALL
exists but is different, RET doesn't exist. JMP and it's variants are called
B (B for Branch) and the variants are different. What is the difference
between L and LA? What are L and LA? MOV doesn't exist, there a whole host
of different instructions for moving data depending upon source, destination
and format. Do they understand what the 16M line and the 2G bar are and
their implications for code? And a whole other list longer than my arm. The
manual that defines the Assembler language is, at last count, 1292 pages of
A4! All of it different from what your pool of gas coders is familiar with,
unless they have previously written IBM Assembler, in which case they
probably used Assembler F or one of it's successors.
STOP. I think that between us we've got a terminology problem: Sven
wasn't using "writer" to refer to somebody writing assembler code, but
to an object embedded in the compiler that is tailored for emitting
assembler statements.

In other words, when we say that an existing assembler writer outputting
gas-format statements for (say) MIPS or 68K can probably be adapted
without too much trouble for s390, we aren't referring to coders on the
development team but to a body of code that comprises a significant
proportion of the compiler's backend.

I'm skipping over the remainder of the message with minimal comment,
since the significant detail is that if the initial FPC implementation
is for Linux than it will obviously have to have the ability to link to
standard libraries on the same system.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Sven Barth
2013-08-24 21:43:39 UTC
Permalink
Post by Mark Morgan Lloyd
Post by Steve
The point with using Linux as a first target is that you would not
need
to implement a new RTL, but only the code generator and the processor
specific RTL code.
Hmmm! I think it will be more complex than that... Certainly, anything
related to I/O will need to be drastically rethought, even before it's
rewritten. The I/O architecture is just too different.
It's not. Linux on 390 (caveat: my experience is on Hercules, but I have
no reason to believe that the distro I was using had been tailored for
emulation) looks like Linux on SPARC, Linux on MIPS, Linux on PPC... and
so on. The only significant departure is that they don't see Ethernet
hardware, but instead talk SLIP to a gateway provided by Hercules or use
some sort of offload engine.
Exactly. On Linux we'd simply use the Linux I/O API which is already
implemented in FPC. This is why it's a good idea to use Linux at first:
less work on the RTL.
Post by Mark Morgan Lloyd
Post by Steve
The point for gas/ld was simply that we have existing writers for
these
two, but writing your own writers for IBM specific tools isn't rocket
science either... But it's another thing you'd need to implement.
You don't have existing writers! They may be familiar with the basic syntax
that gas/ld use, but IBM assembler, gas, Assembler F or High Level Assembler
has a totally different vocabulary. The semantics are also unique. CALL
exists but is different, RET doesn't exist. JMP and it's variants are called
B (B for Branch) and the variants are different. What is the difference
between L and LA? What are L and LA? MOV doesn't exist, there a whole host
of different instructions for moving data depending upon source, destination
and format. Do they understand what the 16M line and the 2G bar are and
their implications for code? And a whole other list longer than my arm. The
manual that defines the Assembler language is, at last count, 1292 pages of
A4! All of it different from what your pool of gas coders is familiar with,
unless they have previously written IBM Assembler, in which case they
probably used Assembler F or one of it's successors.
STOP. I think that between us we've got a terminology problem: Sven
wasn't using "writer" to refer to somebody writing assembler code, but
to an object embedded in the compiler that is tailored for emitting
assembler statements.
In other words, when we say that an existing assembler writer outputting
gas-format statements for (say) MIPS or 68K can probably be adapted
without too much trouble for s390, we aren't referring to coders on the
development team but to a body of code that comprises a significant
proportion of the compiler's backend.
Exactly. Thank you for explaining. :)
Post by Mark Morgan Lloyd
Elaborate enough?
Yes, it was. But Pascal is a nieche language currently, so we need to be
grateful already that we can use C libraries without much problems...
also FPC comes with some header conversions already included, like MySQL
including a database connection wrapper for TDataset ;)

Regards,
Sven
Bernd Oppolzer
2013-09-01 03:19:56 UTC
Permalink
Post by Mark Morgan Lloyd
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system
without a stack as understood today. Parameters are mostly passed in
registers, but this is largely hidden since supervisor calls etc. are
usually hidden in macros.
My own feeling is that it would be best to start targeting a
late-model 390, which does have a stack etc., and to use the standard
GNU assembler and linker (as and ld) /initially/ targeting Linux. Any
other combination (i.e. a proprietary assembler etc. with an antique
MVS as target) is going to cause nothing but grief, since it makes it
very difficult for developers skilled with FPC but not with IBM
mainframes to give any practical help.
Sorry, I don't follow some of the comments in this thread.

First, what is a stack? A stack IMO consists of a stack pointer provided
by the
hardware, that is, a register. And even the old IBM machines have many
registers
that can be seen as stack pointers; you only have to select one. That is
in fact the
way that the stack has been implemented in "old" compilers on "old" IBM
hardware,
be it Fortran, Pascal, PL/1 or C. All those languages needed kind of
stacks for
local variables of procedures etc. So IMO there is no need to target new
hardware;
this will be very expensive and excludes the use of Hercules simulators and
free versions of IBM operating systems for first tests. Even todays IBM
compilers
don't use the "new" hardware stack.

And then: I have some doubts about the linkage between FPC and the GNU
tools,
like as and ld. Why is it easier to port FPC to a Linux based system?
If the compiler translates into an abstract intermediate language first
and then into an abstract assembly language maybe - for an abstract machine
like the P-machine - then the nature of the assembler and linker used
should be irrelevant. Maybe there is some misunderstanding on my - or
our - part;
I have the Wirth compilers in mind, and there is a clear separation between
the machine independent parts - phase 1 - which generates P-code and
the machine dependent parts - phase 2 and runtime. Even if there is no such
separation in FPC, it should IMO be possible to develop and test the code
generation separately.

I, too, had the difficulties, like Paul Robinson, that I did not get the
cross-compiler
working. My goal was, for example, to have a cross compiler running on
Windows, that produces Assembler output for MIPS for example, and
for a second target S370, which is at the beginning simply a copy of MIPS,
producing the identical output, but then I could make changes to the S370
code generation and try to get the results running on a simulated or
real 370 hardware.
Could you maybe outline the steps that are necessary to create such an
environment?

Kind regards

Bernd
Hans-Peter Diettrich
2013-09-01 05:10:49 UTC
Permalink
Post by Bernd Oppolzer
Post by Mark Morgan Lloyd
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system
without a stack as understood today. Parameters are mostly passed in
registers, but this is largely hidden since supervisor calls etc. are
usually hidden in macros.
My own feeling is that it would be best to start targeting a
late-model 390, which does have a stack etc., and to use the standard
GNU assembler and linker (as and ld) /initially/ targeting Linux. Any
other combination (i.e. a proprietary assembler etc. with an antique
MVS as target) is going to cause nothing but grief, since it makes it
very difficult for developers skilled with FPC but not with IBM
mainframes to give any practical help.
Sorry, I don't follow some of the comments in this thread.
First, what is a stack? A stack IMO consists of a stack pointer provided
by the
hardware, that is, a register. And even the old IBM machines have many
registers
that can be seen as stack pointers; you only have to select one.
You are not free to select one register for that purpose, unless it's
guaranteed (by calling conventions) that all software preserves this
register.
Post by Bernd Oppolzer
That is
in fact the
way that the stack has been implemented in "old" compilers on "old" IBM
hardware,
be it Fortran, Pascal, PL/1 or C. All those languages needed kind of
stacks for
local variables of procedures etc. So IMO there is no need to target new
hardware;
this will be very expensive and excludes the use of Hercules simulators and
free versions of IBM operating systems for first tests. Even todays IBM
compilers
don't use the "new" hardware stack.
Such subroutines deserve 3 addresses, to their parameters, local
variables, and a return address. The latter can not be choosen freely,
it's implemented in machine CALL/RETURN instructions, which manage kind
of an *return stack*. When these instructions predate the hardware
stack, the return addresses obviously have been managed in some
different way, not using the new hardware stack. This means that a
calling convention must be established, for passing subroutine arguments
and return values, for local storage allocation, and for stack
unwinding. Some convention may already exist, for calling system
services or library functions. Apart from that every compiler can choose
its own model, for further (private) calling conventions. In most cases
a single register (Base or Frame Pointer) is used for that purpose,
which *can* point to some stack location, but it also can point to any
other convenient memory frame. Let's call it the *data stack* for
disambiguation.

Both kinds of stacks become very important when a debugger is added. The
debugger must not only know about the return stack, which is machine
specific, but it also must know about the calling conventions, how to
access subroutine parameters and local variables, and how these frames
are linked together during subroutine calls. This means that, when an
already existing debugger shall be used, the available calling
conventions are already specified!

DoDi
Florian Klämpfl
2013-09-01 07:52:28 UTC
Permalink
Post by Bernd Oppolzer
My goal was, for example, to have a cross compiler running on
Windows, that produces Assembler output for MIPS for example, and
for a second target S370, which is at the beginning simply a copy of MIPS,
producing the identical output, but then I could make changes to the S370
code generation and try to get the results running on a simulated or
real 370 hardware.
Could you maybe outline the steps that are necessary to create such an
environment?
This will not help you because you don't learn anything by this. As said previously, look at the steps how I started the aarch64 port and try to do the same for S370 or whatever.
Mark Morgan Lloyd
2013-09-01 08:46:21 UTC
Permalink
Post by Bernd Oppolzer
Post by Mark Morgan Lloyd
Just to name a few: you'll need to get parameter passing for functions
correctly
Which leads to another issue: the 370 is a register-based system
without a stack as understood today. Parameters are mostly passed in
registers, but this is largely hidden since supervisor calls etc. are
usually hidden in macros.
My own feeling is that it would be best to start targeting a
late-model 390, which does have a stack etc., and to use the standard
GNU assembler and linker (as and ld) /initially/ targeting Linux. Any
other combination (i.e. a proprietary assembler etc. with an antique
MVS as target) is going to cause nothing but grief, since it makes it
very difficult for developers skilled with FPC but not with IBM
mainframes to give any practical help.
Sorry, I don't follow some of the comments in this thread.
First, what is a stack? A stack IMO consists of a stack pointer provided
by the
hardware, that is, a register. And even the old IBM machines have many
registers
that can be seen as stack pointers; you only have to select one. That is
in fact the
way that the stack has been implemented in "old" compilers on "old" IBM
hardware,
be it Fortran, Pascal, PL/1 or C. All those languages needed kind of
stacks for
local variables of procedures etc. So IMO there is no need to target new
hardware;
this will be very expensive and excludes the use of Hercules simulators and
free versions of IBM operating systems for first tests. Even todays IBM
compilers
don't use the "new" hardware stack.
I think that an initial port to Linux (targeting a late-model 390
emulated by Hercules) is unavoidable, and that also has the advantage of
being of immediate use to the wider community. That implies that it has
to be able to use the standard calling convention on that platform, as
excerpted at
http://wiki.lazarus.freepascal.org/Assembler_and_ABI_Resources#zSeries_.28S.2F390.29_2
Also see the assembler examples at
http://wiki.lazarus.freepascal.org/Assembler_and_ABI_Resources#zSeries_.28S.2F390.29
which show actual assembler sequences generated (one for the Linux
kernel, others for a "Hello, World").

So at the very least, we have to consistently simulate a stack- apart
from anything else, that's mandated by Pascal's use of recursion. But we
don't necessarily have to use the same calling convention for Linux and
for "classic" OSes (i.e. including those which are freely-available,
running on Hercules etc.).

Question (to save me digging into the manuals right now): where a recent
machine uses the dedicated stack instructions, is the stack pointer one
of the standard registers? In other words, can push/pop operations be
trivially and exactly simulated for older hardware?
Post by Bernd Oppolzer
And then: I have some doubts about the linkage between FPC and the GNU
tools,
like as and ld. Why is it easier to port FPC to a Linux based system?
Because the FPC sources are ASCII, all the RTL code assumes ASCII
(including collation order, behaviour of Pred() and Succ(), and so on)
and- most importantly- because almost all of the developers are either
using it or are familiar with it by necessity.
Post by Bernd Oppolzer
If the compiler translates into an abstract intermediate language first
and then into an abstract assembly language maybe - for an abstract machine
like the P-machine - then the nature of the assembler and linker used
should be irrelevant. Maybe there is some misunderstanding on my - or
our - part;
Yes, but the compiler /doesn't/. The one possible getout here would be
to use one of the variants that already generates Java bytecodes or
(potentially) something like WebJS, but that would immediately exclude
any of the "classic" OSes unless a custom backend were written from scratch.
Post by Bernd Oppolzer
I have the Wirth compilers in mind, and there is a clear separation between
the machine independent parts - phase 1 - which generates P-code and
the machine dependent parts - phase 2 and runtime. Even if there is no such
separation in FPC, it should IMO be possible to develop and test the code
generation separately.
I don't think that would work, there are too many quasi-circular
dependencies (I'm intermittently wrestling with a potential
code-generation problem, and hoo-boy).
Post by Bernd Oppolzer
I, too, had the difficulties, like Paul Robinson, that I did not get the
cross-compiler
working. My goal was, for example, to have a cross compiler running on
Windows, that produces Assembler output for MIPS for example, and
for a second target S370, which is at the beginning simply a copy of MIPS,
producing the identical output, but then I could make changes to the S370
code generation and try to get the results running on a simulated or
real 370 hardware.
Could you maybe outline the steps that are necessary to create such an
environment?
I agree that the MIPS target is very similar, including Linux calling
conventions etc. (noting that there are several possible calling
conventions, and that information describing these might only be
available under NDA).

However note Florian's comment about there being an optimal sequence for
bringing things up, and also note that Sven has recently worked on the
68K port which might be sufficiently-similar to be relevant.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Hans-Peter Diettrich
2013-09-01 18:48:20 UTC
Permalink
Post by Mark Morgan Lloyd
So at the very least, we have to consistently simulate a stack- apart
from anything else, that's mandated by Pascal's use of recursion. But we
don't necessarily have to use the same calling convention for Linux and
for "classic" OSes (i.e. including those which are freely-available,
running on Hercules etc.).
See my note on the OS specific/supplied debugger.
Post by Mark Morgan Lloyd
Question (to save me digging into the manuals right now): where a recent
machine uses the dedicated stack instructions, is the stack pointer one
of the standard registers? In other words, can push/pop operations be
trivially and exactly simulated for older hardware?
You mean thread safety?

As long as only one thread is running, the push/pop instructions must
not be atomic.

Multiple threads introduce many more problems, because their return
stacks must never get mixed. Furthermore each thread must have its own
stack, again no conflicts.

DoDi
Mark Morgan Lloyd
2013-09-01 19:42:54 UTC
Permalink
Post by Hans-Peter Diettrich
Post by Mark Morgan Lloyd
So at the very least, we have to consistently simulate a stack- apart
from anything else, that's mandated by Pascal's use of recursion. But
we don't necessarily have to use the same calling convention for Linux
and for "classic" OSes (i.e. including those which are
freely-available, running on Hercules etc.).
See my note on the OS specific/supplied debugger.
Post by Mark Morgan Lloyd
Question (to save me digging into the manuals right now): where a
recent machine uses the dedicated stack instructions, is the stack
pointer one of the standard registers? In other words, can push/pop
operations be trivially and exactly simulated for older hardware?
You mean thread safety?
No, I meant that Bernd suggested R1 earlier as a simulated stack
pointer. Does IBM use R1 for this on variants of the architecture that
have push/pop opcodes, or some other general-purpose register, or a
dedicated register?
Post by Hans-Peter Diettrich
As long as only one thread is running, the push/pop instructions must
not be atomic.
I hate to correct language usage, but "/need/ not be atomic" would be
clearer.
Post by Hans-Peter Diettrich
Multiple threads introduce many more problems, because their return
stacks must never get mixed. Furthermore each thread must have its own
stack, again no conflicts.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Bernd Oppolzer
2013-09-01 22:43:10 UTC
Permalink
Post by Mark Morgan Lloyd
No, I meant that Bernd suggested R1 earlier as a simulated stack
pointer. Does IBM use R1 for this on variants of the architecture that
have push/pop opcodes, or some other general-purpose register, or a
dedicated register?
R1 was only meant as an example.

The true linkage and stack conventions of (most) IBM OSes are like this:

R13 points to the save areas of the current procedure or function (that is
the current stack frame of this procedure; at the beginning of this
stack frame
there is always a register save area for the 16 general purpose registers,
which contain the return adress, entry point and parameter base adress,
too).
Following this save area, we have the local (automatic) variables of the
current procedure or function. If a parameter list for the call of a
subsequent
procedure has to be built, this is also done in a work area which is
part of
the stack frame of this function, and before calling the next function,
R1 is set to point to the beginning of this area.

R15 always contains the entry point address for the new procedure,
and R14 always contains the return address. All 16 registers are saved
in the prologue of the new procedure and restored upon return - with the
exception of R13 itself, which is handled separately - the save areas
are chained together using the 2nd and 3rd word (backward and
foreward pointer). This is the way how the contents of R13 are saved.

The are machine instructions to do the register saves and restores in
a convenient way - all the registers from R0 to R15 (with the exception
of R13) are saved and restored using one machine instruction.
It looks like this:

STM R14,R12,12(R13) - that is, R14,R15, and R0 thru R12 are saved
at offset 12 from R13.

Kind regards

Bernd
Mark Morgan Lloyd
2013-09-02 08:37:22 UTC
Permalink
Post by Bernd Oppolzer
Post by Mark Morgan Lloyd
No, I meant that Bernd suggested R1 earlier as a simulated stack
pointer. Does IBM use R1 for this on variants of the architecture that
have push/pop opcodes, or some other general-purpose register, or a
dedicated register?
R1 was only meant as an example.
R13 points to the save areas of the current procedure or function (that is
the current stack frame of this procedure; at the beginning of this
stack frame
there is always a register save area for the 16 general purpose registers,
which contain the return adress, entry point and parameter base adress,
too).
Following this save area, we have the local (automatic) variables of the
current procedure or function. If a parameter list for the call of a
subsequent
procedure has to be built, this is also done in a work area which is
part of
the stack frame of this function, and before calling the next function,
R1 is set to point to the beginning of this area.
R15 always contains the entry point address for the new procedure,
and R14 always contains the return address. All 16 registers are saved
in the prologue of the new procedure and restored upon return - with the
exception of R13 itself, which is handled separately - the save areas
are chained together using the 2nd and 3rd word (backward and
foreward pointer). This is the way how the contents of R13 are saved.
The are machine instructions to do the register saves and restores in
a convenient way - all the registers from R0 to R15 (with the exception
of R13) are saved and restored using one machine instruction.
STM R14,R12,12(R13) - that is, R14,R15, and R0 thru R12 are saved
at offset 12 from R13.
That's obviously far friendlier to a language like Pascal than the
examples in most assembler-level treatises- I wonder how compatible it
is with the description of the Linux ABI informally at
http://linuxvm.org/present/SHARE99/S8139db.pdf ?

I notice that Paul has added updates to
http://wiki.lazarus.freepascal.org/ZSeries over the last week or so but
haven't had time to read them in detail yet.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Tomas Hajny
2013-09-02 09:26:26 UTC
Permalink
.
.
Post by Mark Morgan Lloyd
I notice that Paul has added updates to
http://wiki.lazarus.freepascal.org/ZSeries over the last week or so but
haven't had time to read them in detail yet.
Looking at Paul's articles in Wiki, I've noticed that he has used 2.6.0 or
2.6.2 as his basis. This doesn't look like a good idea to me, because new
platforms are always added in trunk and thus the modifications will need
to be updated when merging them to trunk. Moreover, I believe that the
latest additions related to UnicodeString support, etc., may be especially
useful to z370 when tackling the EBCDIC versus ASCII challenge. Just my 2
cents...

Tomas
Mark Morgan Lloyd
2013-09-03 07:54:25 UTC
Permalink
Post by Tomas Hajny
.
.
Post by Mark Morgan Lloyd
I notice that Paul has added updates to
http://wiki.lazarus.freepascal.org/ZSeries over the last week or so but
haven't had time to read them in detail yet.
Looking at Paul's articles in Wiki, I've noticed that he has used 2.6.0 or
2.6.2 as his basis. This doesn't look like a good idea to me, because new
platforms are always added in trunk and thus the modifications will need
to be updated when merging them to trunk. Moreover, I believe that the
latest additions related to UnicodeString support, etc., may be especially
useful to z370 when tackling the EBCDIC versus ASCII challenge. Just my 2
cents...
On the other hand, since he was in the position of trying to work out
how everything hung together without feeling that he was already "an
insider" I think that he might be forgiven for working on a stable
version. This is, after all, how the MIPS target was implemented (it's
regrettable that it wasn't folded back into the trunk a bit more
promptly, which might have saved work).

I've just been reading through his recent additions to the wiki and I
note that he recognises that (comparatively) recent additions to the
architecture might be useful. If he's actually making progress on this
it would be very interesting to have a status report.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Bernd Oppolzer
2013-09-02 22:15:19 UTC
Permalink
Post by Mark Morgan Lloyd
That's obviously far friendlier to a language like Pascal than the
examples in most assembler-level treatises- I wonder how compatible it
is with the description of the Linux ABI informally at
http://linuxvm.org/present/SHARE99/S8139db.pdf ?
Some remarks regarding code generation:

if you have a language like C which doesn't support nested procedure
definitions,
it's perfectly simple. You can reach the local (auto) variables using
register R13, and
the parameters using R1. You only need another register to access the
static data
and some kind of heap management to support malloc() etc., and that's
about all.

With "classical" 360 machines, you had the problem that the offsets in
the machine
instructions only were 12 bits long, so you could only address 4 k from
the position
of a base register directly. That is, if your automatic data (of one
procedure) was
larger than 4 k, you were in trouble. Data after the 4 k barrier had to
be addressed
using two steps; first compute the address and the fetch the data - for
example.

With new z-Arch instructions, this is no problem any more.

Same goes for the procedure code itself; PASCAL/VS (an old IBM compiler
of the 1980 years) limited the size of procedures to 8 k - which is some
hundred
lines of source code. This, too, is no problem today any more, because,
when
you use the new relative branches, you don't need base registers for the
code area.

A language like Pascal, which allows the nesting of procedures and the
access
of auto variables that are defined outside the local procedure (that is:
in procedures
above the local procedure), you need to retrieve the stack frame of that
procedure
first. This is done by walking up the chain of the save areas and load
the base address
of the stack frame of the interesting procedure into another base
register, different
from R13 (for example). This is a problem, that a C compiler doesn't
have - but it's
well known to PL/1, too.

This all is very well known and can be derived from the other compilers
that deal
with the z-Arch environment, for example the IBM commercial ones.

The z-Arch has evolved very much in the last 10 years, compared to 360
and 370 days,
and this makes code generation and the life of compiler builders much
easier.

Kind regards

Bernd
Sven Barth
2013-09-03 05:50:46 UTC
Permalink
Post by Bernd Oppolzer
With "classical" 360 machines, you had the problem that the offsets in
the machine
Post by Bernd Oppolzer
instructions only were 12 bits long, so you could only address 4 k from
the position
Post by Bernd Oppolzer
of a base register directly. That is, if your automatic data (of one
procedure) was
Post by Bernd Oppolzer
larger than 4 k, you were in trouble. Data after the 4 k barrier had to
be addressed
Post by Bernd Oppolzer
using two steps; first compute the address and the fetch the data - for
example.
Post by Bernd Oppolzer
With new z-Arch instructions, this is no problem any more.
FPC already supports some CPUs which have such restrictions as well. It's
no real problem to split up single instruction branches/moves to do a
calculation of an address before branching/moving. I'm doing this for
example for the Coldfire m68k variant.
Post by Bernd Oppolzer
A language like Pascal, which allows the nesting of procedures and the
access
in procedures
Post by Bernd Oppolzer
above the local procedure), you need to retrieve the stack frame of that
procedure
Post by Bernd Oppolzer
first. This is done by walking up the chain of the save areas and load
the base address
Post by Bernd Oppolzer
of the stack frame of the interesting procedure into another base
register, different
Post by Bernd Oppolzer
from R13 (for example). This is a problem, that a C compiler doesn't have
- but it's
Post by Bernd Oppolzer
well known to PL/1, too.
In FPC we AFAIK pass the parent frame to a nested function as an additional
parameter.

Regards,
Sven
Mark Morgan Lloyd
2013-09-03 06:45:30 UTC
Permalink
Post by Bernd Oppolzer
Post by Mark Morgan Lloyd
That's obviously far friendlier to a language like Pascal than the
examples in most assembler-level treatises- I wonder how compatible it
is with the description of the Linux ABI informally at
http://linuxvm.org/present/SHARE99/S8139db.pdf ?
if you have a language like C which doesn't support nested procedure
definitions,
it's perfectly simple. You can reach the local (auto) variables using
register R13, and
the parameters using R1. You only need another register to access the
static data
and some kind of heap management to support malloc() etc., and that's
about all.
I believe this sort of thing is also an issue when a function recurses.
Post by Bernd Oppolzer
With "classical" 360 machines, you had the problem that the offsets in
the machine
instructions only were 12 bits long, so you could only address 4 k from
the position
of a base register directly. That is, if your automatic data (of one
procedure) was
larger than 4 k, you were in trouble. Data after the 4 k barrier had to
be addressed
using two steps; first compute the address and the fetch the data - for
example.
With new z-Arch instructions, this is no problem any more.
Same goes for the procedure code itself; PASCAL/VS (an old IBM compiler
of the 1980 years) limited the size of procedures to 8 k - which is some
hundred
lines of source code. This, too, is no problem today any more, because,
when
you use the new relative branches, you don't need base registers for the
code area.
Anything that artificially limits function size, or forces the code
generator to jump through hoops to work around architecture issues, is a
problem /particularly/ when an external tool has automatically generated
(Pascal) source. I'm not sure how many people use that technique these
days but it was fairly popular in the era when Fortran was dominant, and
there are still tools that generate C or Java source.
Post by Bernd Oppolzer
A language like Pascal, which allows the nesting of procedures and the
access
in procedures
above the local procedure), you need to retrieve the stack frame of that
procedure
first. This is done by walking up the chain of the save areas and load
the base address
of the stack frame of the interesting procedure into another base
register, different
from R13 (for example). This is a problem, that a C compiler doesn't
have - but it's
well known to PL/1, too.
This all is very well known and can be derived from the other compilers
that deal
with the z-Arch environment, for example the IBM commercial ones.
The z-Arch has evolved very much in the last 10 years, compared to 360
and 370 days,
and this makes code generation and the life of compiler builders much
easier.
[Nod] I believe that most of the above- as well as things like IEEE
floating point support- is in late-model 390s and in the Hercules emulator.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Bernd Oppolzer
2013-09-03 11:41:55 UTC
Permalink
Post by Mark Morgan Lloyd
Post by Bernd Oppolzer
if you have a language like C which doesn't support nested procedure
definitions,
it's perfectly simple. You can reach the local (auto) variables using
register R13, and
the parameters using R1. You only need another register to access the
static data
and some kind of heap management to support malloc() etc., and that's
about all.
I believe this sort of thing is also an issue when a function recurses.
C supports recursion of functions, and it is without
problems possible with the scheme outlined above
and in the previous mails.
Post by Mark Morgan Lloyd
Post by Bernd Oppolzer
With "classical" 360 machines, you had the problem that the offsets
in the machine
instructions only were 12 bits long, so you could only address 4 k
from the position
of a base register directly. That is, if your automatic data (of one
procedure) was
larger than 4 k, you were in trouble. Data after the 4 k barrier had
to be addressed
using two steps; first compute the address and the fetch the data -
for example.
With new z-Arch instructions, this is no problem any more.
Same goes for the procedure code itself; PASCAL/VS (an old IBM compiler
of the 1980 years) limited the size of procedures to 8 k - which is
some hundred
lines of source code. This, too, is no problem today any more,
because, when
you use the new relative branches, you don't need base registers for
the code area.
Anything that artificially limits function size, or forces the code
generator to jump through hoops to work around architecture issues, is
a problem /particularly/ when an external tool has automatically
generated (Pascal) source. I'm not sure how many people use that
technique these days but it was fairly popular in the era when Fortran
was dominant, and there are still tools that generate C or Java source.
Agreed. But although the limit of 8 k per procedure seems
strange from todays point of view, there were significant
projects done with this compiler, for example the first TCP/IP stack
for the 370 platform in the early 1980s (inside IBM, in Pascal !!).

I still have lots of tools generating source code, for example PL/1 and C.
XML validator definitions, derived from XSDs (to speed up my XML parser) -
well, that's definitions, not executable code, but ocassionnally very
large, if the
XSD is large - and database access routines ...
Post by Mark Morgan Lloyd
Post by Bernd Oppolzer
A language like Pascal, which allows the nesting of procedures and
the access
of auto variables that are defined outside the local procedure (that
is: in procedures
above the local procedure), you need to retrieve the stack frame of
that procedure
first. This is done by walking up the chain of the save areas and
load the base address
of the stack frame of the interesting procedure into another base
register, different
from R13 (for example). This is a problem, that a C compiler doesn't
have - but it's
well known to PL/1, too.
This all is very well known and can be derived from the other
compilers that deal
with the z-Arch environment, for example the IBM commercial ones.
The z-Arch has evolved very much in the last 10 years, compared to
360 and 370 days,
and this makes code generation and the life of compiler builders much
easier.
[Nod] I believe that most of the above- as well as things like IEEE
floating point support- is in late-model 390s and in the Hercules emulator.
yes !!

Hans-Peter Diettrich
2013-09-01 22:49:31 UTC
Permalink
Post by Mark Morgan Lloyd
Post by Hans-Peter Diettrich
Post by Mark Morgan Lloyd
Question (to save me digging into the manuals right now): where a
recent machine uses the dedicated stack instructions, is the stack
pointer one of the standard registers? In other words, can push/pop
operations be trivially and exactly simulated for older hardware?
You mean thread safety?
No, I meant that Bernd suggested R1 earlier as a simulated stack
pointer. Does IBM use R1 for this on variants of the architecture that
have push/pop opcodes, or some other general-purpose register, or a
dedicated register?
Dunno, sorry. Perhaps a new register has been introduced with the new
instructions?
Post by Mark Morgan Lloyd
Post by Hans-Peter Diettrich
As long as only one thread is running, the push/pop instructions must
not be atomic.
I hate to correct language usage, but "/need/ not be atomic" would be
clearer.
Much appreciated :-)

DoDi
Sven Barth
2013-09-01 10:26:06 UTC
Permalink
Post by Bernd Oppolzer
And then: I have some doubts about the linkage between FPC and the GNU
tools,
like as and ld. Why is it easier to port FPC to a Linux based system?
If the compiler translates into an abstract intermediate language first
and then into an abstract assembly language maybe - for an abstract machine
like the P-machine - then the nature of the assembler and linker used
should be irrelevant. Maybe there is some misunderstanding on my - or
our - part;
I have the Wirth compilers in mind, and there is a clear separation between
the machine independent parts - phase 1 - which generates P-code and
the machine dependent parts - phase 2 and runtime. Even if there is no such
separation in FPC, it should IMO be possible to develop and test the code
generation separately.
If someone wants to port the compiler to a new target processor it is
advisable to look whether there exists an OS that is already supported
by FPC, because then "only" the code generator and the CPU specific
parts of the RTL need to be implemented while the remaining RTL can be
reused which simplifies the first steps of the port. Otherwise you'd
need to implement the code generator and a more or less complete RTL.
So as Linux seems to be available at least for some variants of the CPU
I would strongly suggest to target Linux first and other OSes later.

Also the compile process of FPC is roughly this:
- for each used unit:
- parse the unit and generate a node tree for each
procedure/function/method (basically platform independant)
- generate a CPU specific linear assembler representation of each
node tree (this representation is independant of the specific assembler
used)
- if an external assembler (e.g. GNU as) is used: convert the
assembler lists to assembler files
- call the assembler (internal assemblers work on the assembler
lists directly) to generate the object file
- for external linkers (e.g. GNU ld): write a linkscript to instrument
the external linker
- call the linker (internal linkers work directly on the in memory
information the compiler has gathered)

For a new port it is advisable to not implement internal assemblers and
linkers, because that means more work, but to use existing tools for the
given platform. And here like for the RTL assembler/linker interfaces
for the GNU assembler and linker already exist which simplify a port.
Later on additional assembler/linker interfaces for other
assemblers/linkers can be added or even an internal assembler and/or
linker can be written.

Regards,
Sven
Mark Morgan Lloyd
2013-09-01 13:00:11 UTC
Permalink
Post by Sven Barth
If someone wants to port the compiler to a new target processor it is
advisable to look whether there exists an OS that is already supported
by FPC, because then "only" the code generator and the CPU specific
parts of the RTL need to be implemented while the remaining RTL can be
reused which simplifies the first steps of the port. Otherwise you'd
need to implement the code generator and a more or less complete RTL.
So as Linux seems to be available at least for some variants of the CPU
I would strongly suggest to target Linux first and other OSes later.
- parse the unit and generate a node tree for each
procedure/function/method (basically platform independant)
- generate a CPU specific linear assembler representation of each
node tree (this representation is independant of the specific assembler
used)
- if an external assembler (e.g. GNU as) is used: convert the
assembler lists to assembler files
- call the assembler (internal assemblers work on the assembler
lists directly) to generate the object file
- for external linkers (e.g. GNU ld): write a linkscript to instrument
the external linker
- call the linker (internal linkers work directly on the in memory
information the compiler has gathered)
Presumably that implies that it's not feasible to have an internal
assembler followed by a standard platform-specific linker. The reason I
highlight this combination is that I've had no success finding a free
assembler other than GNU as (gas) which covers all development
platforms, in particular "The free Tachyon Legacy Assembler is available
for MVS 3.8 and Linux/390" which obviously makes it NBG for the initial
stages of development on x86 Linux or Windows.

One of the links I posted earlier demonstrates that GCC is generally
tailored to suit the prevalent assembler on each operating system
(Linux, MUSIC/SP, CMS or whatever). That obviously doesn't imply that
the macro packages that serious assembler programmers would expect are
available.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Jonas Maebe
2013-09-01 14:02:08 UTC
Permalink
- parse the unit and generate a node tree for each procedure/function/method (basically platform independant)
- generate a CPU specific linear assembler representation of each node tree (this representation is independant of the specific assembler used)
- if an external assembler (e.g. GNU as) is used: convert the assembler lists to assembler files
- call the assembler (internal assemblers work on the assembler lists directly) to generate the object file
- for external linkers (e.g. GNU ld): write a linkscript to instrument the external linker
- call the linker (internal linkers work directly on the in memory information the compiler has gathered)
Presumably that implies that it's not feasible to have an internal assembler followed by a standard platform-specific linker.
The Linux and *BSD x86 platforms by default all use an internal assembler and an external linker. Even an internal linker combined with an external assembler works. The internal linker can load external object files (otherwise it would not be able to link in static libraries).


Jonas
Bernd Oppolzer
2013-09-01 13:49:09 UTC
Permalink
Post by Sven Barth
If someone wants to port the compiler to a new target processor it is
advisable to look whether there exists an OS that is already supported
by FPC, because then "only" the code generator and the CPU specific
parts of the RTL need to be implemented while the remaining RTL can be
reused which simplifies the first steps of the port. Otherwise you'd
need to implement the code generator and a more or less complete RTL.
So as Linux seems to be available at least for some variants of the
CPU I would strongly suggest to target Linux first and other OSes later.
- parse the unit and generate a node tree for each
procedure/function/method (basically platform independant)
- generate a CPU specific linear assembler representation of each
node tree (this representation is independant of the specific
assembler used)
- if an external assembler (e.g. GNU as) is used: convert the
assembler lists to assembler files
- call the assembler (internal assemblers work on the assembler
lists directly) to generate the object file
- for external linkers (e.g. GNU ld): write a linkscript to instrument
the external linker
- call the linker (internal linkers work directly on the in memory
information the compiler has gathered)
Thank you very much for that, that made things much clearer for me.

So the compiler relies heavily on the external assembler and the syntax
it supports,
as long as you don't want to do changes to step 2 (that is, change the
linear assembler
representation, which IMO should not be done in the first step).

And: the assembler is not called once, but for every unit.

So here, I think, we have some problems or issues, because, as already
pointed out,
the z-Arch doesn't have PUSH and POP instructions, and I guess that the
outcome
of the linear assembler representation will not be very suitable to the
things that the
z-Arch instruction set provides, although in the meantime there are some
1500 instructions.

Understanding that, I would now like to have some description of the
linear assembler
representation that FPC generates, that is: it is of course not
target-specific, but it does of
course do some assumptions on the type of the underlying hardware.
Maybe, for example,
it assumes the existence of PUSH and POP instructions and some number of
registers
which can hold fixed point and floating point values and which are the
target of the
PUSH and POP instructions (and, of course, ASCII). The z-Arch Hardware,
in contrast, normally
does not access parameters - for example - by issuing individual PUSH or
POP instructions,
but - for example - if there are ten parameters requiring - say - 64
bytes of storage,
it increments the "stack pointer" only once (by 64) and accesses the
parameters
by relative offsets to the value of the "stack pointer". Simulating the
individual
PUSHs and POPs by Assembler macros would by possible, but it would be a
waste of time.

So my question is: is it possible to modify the outcome of step 2 (the
linear
assembler representation) depending on the target platform - which is
desirable
in my opinion for performance reasons - or: should we stick with the
outcome
of step 2 - which contains probably PUSH and POP instructions in gas
syntax etc -
and simply try to convert them in the one or other way to z-Arch
instructions -
z Assembler Macros or a more intelligent assembler writer function which
converts
gas syntax to z Assembler syntax - and accept the performance degradation
in the first place?

BTW: is it possible to print the linear assembler representation - the
outcome of step 2 -
which in my understanding is NOT target-specific - and compare it - for
example - to
the assembler code generated for MIPS? I believe that from that mapping
there could
indeed be learned something ...
Post by Sven Barth
For a new port it is advisable to not implement internal assemblers
and linkers, because that means more work, but to use existing tools
for the given platform. And here like for the RTL assembler/linker
interfaces for the GNU assembler and linker already exist which
simplify a port. Later on additional assembler/linker interfaces for
other assemblers/linkers can be added or even an internal assembler
and/or linker can be written.
I, too, would try to rely on existing assemblers and linkers,
but I have the feeling that HLASM (or free Assemblers supporting
the z-Arch vocabulary) is a better choice than gas. I believe that
there are some people out there waiting for FPC being available
on z-Arch, and they are not all Unix-aware, but they read and
understand the classical z-machinery well and would like it,
if FPC was available there, without additional prerequisites
(from their point of view).

Of course, providing an internal assembler/linker is too much
work and even not necessary, in my opinion.
Post by Sven Barth
Regards,
Sven
_______________________________________________
http://lists.freepascal.org/mailman/listinfo/fpc-devel
Mark Morgan Lloyd
2013-09-01 14:02:36 UTC
Permalink
Bernd Oppolzer wrote:

I'm about to head out, so have to be extremely brief.
Post by Bernd Oppolzer
Thank you very much for that, that made things much clearer for me.
So the compiler relies heavily on the external assembler and the syntax
it supports,
as long as you don't want to do changes to step 2 (that is, change the
linear assembler
representation, which IMO should not be done in the first step).
And: the assembler is not called once, but for every unit.
So here, I think, we have some problems or issues, because, as already
pointed out,
the z-Arch doesn't have PUSH and POP instructions, and I guess that the
outcome
of the linear assembler representation will not be very suitable to the
things that the
z-Arch instruction set provides, although in the meantime there are some
1500 instructions.
Understanding that, I would now like to have some description of the
linear assembler
representation that FPC generates, that is: it is of course not
target-specific, but it does of
course do some assumptions on the type of the underlying hardware.
Look at the output when using FPC's -a options, for example -aln... that
might in practice need the EXTDEBUG setting during compilation but I
can't go into more detail now.

Push will typically be used to put parameters onto the stack, otherwise
they'll be accessed by indexed operation. The stack frame is discarded
by target-specific code.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Bernd Oppolzer
2013-09-01 14:39:31 UTC
Permalink
Post by Mark Morgan Lloyd
I'm about to head out, so have to be extremely brief.
Post by Bernd Oppolzer
Thank you very much for that, that made things much clearer for me.
So the compiler relies heavily on the external assembler and the
syntax it supports,
as long as you don't want to do changes to step 2 (that is, change
the linear assembler
representation, which IMO should not be done in the first step).
And: the assembler is not called once, but for every unit.
So here, I think, we have some problems or issues, because, as
already pointed out,
the z-Arch doesn't have PUSH and POP instructions, and I guess that
the outcome
of the linear assembler representation will not be very suitable to
the things that the
z-Arch instruction set provides, although in the meantime there are
some 1500 instructions.
Understanding that, I would now like to have some description of the
linear assembler
representation that FPC generates, that is: it is of course not
target-specific, but it does of
course do some assumptions on the type of the underlying hardware.
Look at the output when using FPC's -a options, for example -aln...
that might in practice need the EXTDEBUG setting during compilation
but I can't go into more detail now.
Push will typically be used to put parameters onto the stack,
otherwise they'll be accessed by indexed operation. The stack frame is
discarded by target-specific code.
Thank you for that; I will take a look at it, although I have some doubts,
if the output is "target-specific" or "not target-specific" - and if my
understanding
of the linear assembler representation being "not target-specific" is
right.

For that question I would like some statement from the core developers:
how would you deal with a machine that has no built in PUSH instruction?
For example if a function call puts five parameters on the stack,
which is

LD A
PUSH
LD B
PUSH
LD C
PUSH
LD D
PUSH
LD E
PUSH
CALL FUNC

given an accumulator which is target of LD and a PUSH instruction which
PUSHes
the content of the accumulator to the stack.

In my understanding this could be the not-target specific representation of
the calling sequence

The z-Arch could produce something like

L R5,A
AHI R1,4
ST R5,0(R1)
L R5,B
AHI R1,4
ST R5,0(R1)
L R5,C
AHI R1,4
ST R5,0(R1)
L R5,D
AHI R1,4
ST R5,0(R1)
L R5,E
AHI R1,4
ST R5,0(R1)

here evere PUSH is emulated by the AHI (increment of the "stack pointer"
R1)
and then the indirect store.

But more efficient would be:

L R5,A
ST R5,0(R1)
L R5,B
ST R5,4(R1)
L R5,C
ST R5,8(R1)
L R5,D
ST R5,12(R1)
L R5,E
ST R5,16(R1)
AHI R1,20

still more efficient, if you use other registers (not only R5);
if so, you can maybe store all the values into the stack using only one
instruction (STM) - if the variables are loaded into consecutive
registers (R5, R6, R7 and so on).

That's what the existing compilers on z-Arch normally do - they don't
compile the PUSH instructions one by one as in the first example, but in
contrast,
as there are no PUSH/POP instructions provided by the hardware, they do
some efforts
to do at least only one increment to the stack pointer (like outlined
above) which
is done in the procedure or function prologue.

Now my question is:

do you think that this is a major problem for a FPC port to z-Arch?

Are my assumptions right so far?

Should we start with an easy solution and check the performance
implications later?
Maybe there is a clever solution to that ...

Kind regards

Bernd
Bernd Oppolzer
2013-09-01 14:55:17 UTC
Permalink
No need to answer to that ... I understood in the meantime that FPC does
NOT rely on
PUSH and POP instructions. Instead the linear assembler representation
is already fully
CPU specific.

(which makes porting a bigger effort)

Kind regards

Bernd
Post by Bernd Oppolzer
Post by Mark Morgan Lloyd
I'm about to head out, so have to be extremely brief.
Post by Bernd Oppolzer
Thank you very much for that, that made things much clearer for me.
So the compiler relies heavily on the external assembler and the
syntax it supports,
as long as you don't want to do changes to step 2 (that is, change
the linear assembler
representation, which IMO should not be done in the first step).
And: the assembler is not called once, but for every unit.
So here, I think, we have some problems or issues, because, as
already pointed out,
the z-Arch doesn't have PUSH and POP instructions, and I guess that
the outcome
of the linear assembler representation will not be very suitable to
the things that the
z-Arch instruction set provides, although in the meantime there are
some 1500 instructions.
Understanding that, I would now like to have some description of the
linear assembler
representation that FPC generates, that is: it is of course not
target-specific, but it does of
course do some assumptions on the type of the underlying hardware.
Look at the output when using FPC's -a options, for example -aln...
that might in practice need the EXTDEBUG setting during compilation
but I can't go into more detail now.
Push will typically be used to put parameters onto the stack,
otherwise they'll be accessed by indexed operation. The stack frame
is discarded by target-specific code.
Thank you for that; I will take a look at it, although I have some doubts,
if the output is "target-specific" or "not target-specific" - and if
my understanding
of the linear assembler representation being "not target-specific" is
right.
how would you deal with a machine that has no built in PUSH instruction?
For example if a function call puts five parameters on the stack,
which is
LD A
PUSH
LD B
PUSH
LD C
PUSH
LD D
PUSH
LD E
PUSH
CALL FUNC
given an accumulator which is target of LD and a PUSH instruction
which PUSHes
the content of the accumulator to the stack.
In my understanding this could be the not-target specific
representation of
the calling sequence
The z-Arch could produce something like
L R5,A
AHI R1,4
ST R5,0(R1)
L R5,B
AHI R1,4
ST R5,0(R1)
L R5,C
AHI R1,4
ST R5,0(R1)
L R5,D
AHI R1,4
ST R5,0(R1)
L R5,E
AHI R1,4
ST R5,0(R1)
here evere PUSH is emulated by the AHI (increment of the "stack
pointer" R1)
and then the indirect store.
L R5,A
ST R5,0(R1)
L R5,B
ST R5,4(R1)
L R5,C
ST R5,8(R1)
L R5,D
ST R5,12(R1)
L R5,E
ST R5,16(R1)
AHI R1,20
still more efficient, if you use other registers (not only R5);
if so, you can maybe store all the values into the stack using only one
instruction (STM) - if the variables are loaded into consecutive
registers (R5, R6, R7 and so on).
That's what the existing compilers on z-Arch normally do - they don't
compile the PUSH instructions one by one as in the first example, but
in contrast,
as there are no PUSH/POP instructions provided by the hardware, they
do some efforts
to do at least only one increment to the stack pointer (like outlined
above) which
is done in the procedure or function prologue.
do you think that this is a major problem for a FPC port to z-Arch?
Are my assumptions right so far?
Should we start with an easy solution and check the performance
implications later?
Maybe there is a clever solution to that ...
Kind regards
Bernd
_______________________________________________
http://lists.freepascal.org/mailman/listinfo/fpc-devel
Florian Klämpfl
2013-09-01 16:01:03 UTC
Permalink
Post by Bernd Oppolzer
No need to answer to that ... I understood in the meantime that FPC does
NOT rely on
PUSH and POP instructions. Instead the linear assembler representation
is already fully
CPU specific.
(which makes porting a bigger effort)
Proof?
Bernd Oppolzer
2013-09-01 17:53:37 UTC
Permalink
Post by Florian Klämpfl
Post by Bernd Oppolzer
No need to answer to that ... I understood in the meantime that FPC does
NOT rely on
PUSH and POP instructions. Instead the linear assembler representation
is already fully
CPU specific.
(which makes porting a bigger effort)
Proof?
It's my opinion. If the compiler translates the source language
to machine code for an abstract target machine that is not too complicated
(but well suited to the needs of the source language), you only have to
translate
the operations of this abstract machine one by one to your real target
machine,
which seems to me to be an easier task.

Optimization can occur already in the stages, which are target-independent,
but later, too.

That's in my understanding Niklaus Wirth's P-Code approach. Furthermore,
there is a file interface between the two stages, so that the first part
of the
compiler is very easy to port to new platforms. Only the second part has to
be modified (in theory). My port of the McGill compiler (which is a
descendent
of Stanford, which is a descendent of the P4 compiler) works this way.
In fact,
I only ported it from one IBM opsys to another, so there was no problem
at all.
More interesting is, if I will succeed in porting it to an ASCII and
Intel based
platform (Windows, Linux, OS/2) and what sort of problems I will see when
doing that.

When I did some extensions to the compiler (new statement types like
continue, break, return, which didn't exist before), I only had to
change the
first part of the compiler; the second one was not affected, because I
didn't
need new P-Code instructions.

But maybe the same idea is present in FPC, too. I have to take a closer
look
at the tree representation of the FPC units after the first compile stage -
when my time allows me to do so. I don't have a real feeling at the moment,
if there is more work in the stage before that tree representation or
after that -
that is, if the tree representation looks more like the source code or
already
more like the linear assembly representation.

Kind regards

Bernd
Florian Klämpfl
2013-09-01 18:10:17 UTC
Permalink
Post by Bernd Oppolzer
Post by Florian Klämpfl
Post by Bernd Oppolzer
No need to answer to that ... I understood in the meantime that FPC does
NOT rely on
PUSH and POP instructions. Instead the linear assembler representation
is already fully
CPU specific.
(which makes porting a bigger effort)
Proof?
It's my opinion.
If the compiler translates the source language
to machine code for an abstract target machine that is not too complicated
(but well suited to the needs of the source language), you only have to
translate
the operations of this abstract machine one by one to your real target
machine,
which seems to me to be an easier task.
You miss an important point: an intermediate abstract assembler form
exists, but it is never generated explicitly. Please study first the
compiler sources before making statements as above. E.g. the sparc
specific part of the compiler is approx. 7700 lines only.
Mark Morgan Lloyd
2013-09-01 18:30:05 UTC
Permalink
Post by Bernd Oppolzer
Post by Florian Klämpfl
Post by Bernd Oppolzer
No need to answer to that ... I understood in the meantime that FPC does
NOT rely on
PUSH and POP instructions. Instead the linear assembler representation
is already fully
CPU specific.
(which makes porting a bigger effort)
Proof?
It's my opinion. If the compiler translates the source language
to machine code for an abstract target machine that is not too complicated
(but well suited to the needs of the source language), you only have to
translate
the operations of this abstract machine one by one to your real target
machine,
which seems to me to be an easier task.
The problem here is that compiler design has moved on a lot since
Wirth's day. It's not difficult to write a compiler using e.g. recursive
descent or Meta-II which emits instructions for an abstract stack-based
machine, and that might be a good match for a CPU with a small number of
general-purpose registers. However it can be extremely difficult to
optimise this for a modern CPU with a large register file, it's far more
effective to give the frontend a rough idea of how many registers the
backend has available to it and to warn it about known peculiarities.
Post by Bernd Oppolzer
Optimization can occur already in the stages, which are target-independent,
but later, too.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Bernd Oppolzer
2013-09-01 18:46:06 UTC
Permalink
Post by Mark Morgan Lloyd
The problem here is that compiler design has moved on a lot since
Wirth's day. It's not difficult to write a compiler using e.g.
recursive descent or Meta-II which emits instructions for an abstract
stack-based machine, and that might be a good match for a CPU with a
small number of general-purpose registers. However it can be extremely
difficult to optimise this for a modern CPU with a large register
file, it's far more effective to give the frontend a rough idea of how
many registers the backend has available to it and to warn it about
known peculiarities.
Agreed, but:

you write:

it's far more effective to give the frontend a rough idea ...

the frontend?

In my understanding the frontend of the compiler is the part that reads
the source (scanner, parser) and generates a compact representation
of the source - and different tables with identifiers, structure layouts
etc.

The commercial IBM compilers even have different frontends - one
for PL/1, one for C - but when looking at the internals, it is the same
compiler.

The need to know about the properties of the target machine appears
later, when it comes to code generation. This is in my opinion what the
backend does ...

but we will see ... I guess, the proper limit between frontend and backend
and the correct definition will always be subject of discussions.

Kind regards

Bernd
Mark Morgan Lloyd
2013-09-01 19:37:02 UTC
Permalink
Post by Bernd Oppolzer
Post by Mark Morgan Lloyd
The problem here is that compiler design has moved on a lot since
Wirth's day. It's not difficult to write a compiler using e.g.
recursive descent or Meta-II which emits instructions for an abstract
stack-based machine, and that might be a good match for a CPU with a
small number of general-purpose registers. However it can be extremely
difficult to optimise this for a modern CPU with a large register
file, it's far more effective to give the frontend a rough idea of how
many registers the backend has available to it and to warn it about
known peculiarities.
it's far more effective to give the frontend a rough idea ...
the frontend?
For the purpose of this discussion: the compiler source files in e.g.
/usr/local/src/fpc/fpcbuild/fpcsrc/compiler (but not in subdirectories).
Note assembler writers in ag*pas, which are somewhat scattered.

The CPU-specific backends are in subdirectories, e.g.
../compiler/sparc, .../compiler/mips and so on.

I wanted to avoid using terms like "portable", "non-machine-specific"
and so on because the point has been made that there isn't an entirely
clean division.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Sven Barth
2013-09-01 20:12:45 UTC
Permalink
Post by Bernd Oppolzer
But maybe the same idea is present in FPC, too. I have to take a closer
look
at the tree representation of the FPC units after the first compile stage -
when my time allows me to do so. I don't have a real feeling at the moment,
if there is more work in the stage before that tree representation or
after that -
that is, if the tree representation looks more like the source code or
already
more like the linear assembly representation.
The tree representation is more like the source code. You can get a
feeling for it by calling the compiler with the -vp parameter which will
write the tree for the complete compilation (so better use it only in
small test programs) to a tree.log file in the current directory.

Regards,
Sven
Jonas Maebe
2013-09-01 17:28:30 UTC
Permalink
No need to answer to that ... I understood in the meantime that FPC does NOT rely on
PUSH and POP instructions. Instead the linear assembler representation is already fully
CPU specific.
(which makes porting a bigger effort)
You really want to study the compiler sources (e.g. starting with the AARCH64 support as Florian suggested) before making such statements. There is still an abstraction between the node representation and the assembler output, it's just not an architecture-independent RTL but instead generic code generation methods that can be overridden where necessary.


Jonas
Mark Morgan Lloyd
2013-09-01 17:49:52 UTC
Permalink
Post by Bernd Oppolzer
That's what the existing compilers on z-Arch normally do - they don't
compile the PUSH instructions one by one as in the first example, but in
contrast,
as there are no PUSH/POP instructions provided by the hardware, they do
some efforts
to do at least only one increment to the stack pointer (like outlined
above) which
is done in the procedure or function prologue.
do you think that this is a major problem for a FPC port to z-Arch?
No, but it's important to adhere to the calling conventions for the
various target operating systems. As a particular example, on Linux
compiled code /has/ to be able to link (statically or dynamically) to
libraries such as the resolver that maps computer names to IP addresses.
Post by Bernd Oppolzer
Are my assumptions right so far?
I think so. The important thing is that Pascal mandates a stack due to
the prevalence of recursion, and (in the case of Linux) the ABI
specifies a combined parameter/return-address stack. The question at
this point is obviously what registers the ABI reserves as stack and
frame pointers.
Post by Bernd Oppolzer
Should we start with an easy solution and check the performance
implications later?
Maybe there is a clever solution to that ...
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Jonas Maebe
2013-09-01 14:12:12 UTC
Permalink
Post by Bernd Oppolzer
If someone wants to port the compiler to a new target processor it is advisable to look whether there exists an OS that is already supported by FPC, because then "only" the code generator and the CPU specific parts of the RTL need to be implemented while the remaining RTL can be reused which simplifies the first steps of the port. Otherwise you'd need to implement the code generator and a more or less complete RTL.
So as Linux seems to be available at least for some variants of the CPU I would strongly suggest to target Linux first and other OSes later.
- parse the unit and generate a node tree for each procedure/function/method (basically platform independant)
- generate a CPU specific linear assembler representation of each node tree (this representation is independant of the specific assembler used)
- if an external assembler (e.g. GNU as) is used: convert the assembler lists to assembler files
- call the assembler (internal assemblers work on the assembler lists directly) to generate the object file
- for external linkers (e.g. GNU ld): write a linkscript to instrument the external linker
- call the linker (internal linkers work directly on the in memory information the compiler has gathered)
Thank you very much for that, that made things much clearer for me.
So the compiler relies heavily on the external assembler and the syntax it supports,
as long as you don't want to do changes to step 2 (that is, change the linear assembler
representation, which IMO should not be done in the first step).
As Sven stated above, the linear assembler representation is completely independent of the external assembler and the syntax it supports.
Post by Bernd Oppolzer
So here, I think, we have some problems or issues, because, as already pointed out,
the z-Arch doesn't have PUSH and POP instructions, and I guess that the outcome
of the linear assembler representation will not be very suitable to the things that the
z-Arch instruction set provides, although in the meantime there are some 1500 instructions.
The compiler does not in any way rely on the presence of push and pop instructions.
Post by Bernd Oppolzer
Understanding that, I would now like to have some description of the linear assembler
representation that FPC generates, that is: it is of course not target-specific, but it does of
course do some assumptions on the type of the underlying hardware. Maybe, for example,
it assumes the existence of PUSH and POP instructions and some number of registers
which can hold fixed point and floating point values and which are the target of the
PUSH and POP instructions (and, of course, ASCII).
The JVM does not have any registers at all and is supported by the compiler.
Post by Bernd Oppolzer
So my question is: is it possible to modify the outcome of step 2 (the linear
assembler representation) depending on the target platform
The linear assembler representation is already 100% platform-specific (as Sven mentioned above). FPC does not have a platform-independent internal RTL (register transfer language) representation.


Jonas
Bernd Oppolzer
2013-09-01 14:47:44 UTC
Permalink
Post by Jonas Maebe
Post by Bernd Oppolzer
So my question is: is it possible to modify the outcome of step 2 (the linear
assembler representation) depending on the target platform
The linear assembler representation is already 100% platform-specific (as Sven mentioned above). FPC does not have a platform-independent internal RTL (register transfer language) representation.
Thank you.

So my assumptions so far were plain wrong.

The difference between platforms is already in the stages above stage 2.

That is (cited from Sven's mail):

<citation>
Also the compile process of FPC is roughly this:
- for each used unit:
- parse the unit and generate a node tree for each
procedure/function/method (basically platform independant)
- generate a CPU specific linear assembler representation of each
node tree (this representation is independant of the specific assembler
used)
- if an external assembler (e.g. GNU as) is used: convert the
assembler lists to assembler files
- call the assembler (internal assemblers work on the assembler
lists directly) to generate the object file
- for external linkers (e.g. GNU ld): write a linkscript to instrument
the external linker
- call the linker (internal linkers work directly on the in memory
information the compiler has gathered)
</citation>

only the first step - node tree - is platform independant, and
the translations from there is already CPU specific - oh, I see,
it's written there - I looked at the word "independant" in the
paranthese - my fault ...

Sorry for that ...

Then the main effort is to understand what the contents of the node tree
mean
and to build another variant of step 2 (for z-Arch).

Kind regards

Bernd
Mark Morgan Lloyd
2013-09-01 17:40:38 UTC
Permalink
Post by Bernd Oppolzer
only the first step - node tree - is platform independant, and
the translations from there is already CPU specific - oh, I see,
it's written there - I looked at the word "independant" in the
paranthese - my fault ...
Sorry for that ...
Then the main effort is to understand what the contents of the node tree
mean
and to build another variant of step 2 (for z-Arch).
At this point I'd throw in that one of the things the higher levels of
the compiler knows is the overall properties of the registers, i.e.
things like which ones are available for procedure parameters. This is
one of the things that the lower level has to specify, so the
lower-level units aren't there solely to do a macro-style substitution
converting the compiler's internal representation to a sequence of
assembler lines.

The corollary of this is that it's fairly common for a new target CPU to
necessitate higher-level changes, and these then have to be propagated
to all of the other targets. Which is why it's important to keep people
like Florian and Jonas happy :-)
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Bernd Oppolzer
2013-09-01 18:21:51 UTC
Permalink
Post by Mark Morgan Lloyd
At this point I'd throw in that one of the things the higher levels of
the compiler knows is the overall properties of the registers, i.e.
things like which ones are available for procedure parameters. This is
one of the things that the lower level has to specify, so the
lower-level units aren't there solely to do a macro-style substitution
converting the compiler's internal representation to a sequence of
assembler lines.
The corollary of this is that it's fairly common for a new target CPU
to necessitate higher-level changes, and these then have to be
propagated to all of the other targets. Which is why it's important to
keep people like Florian and Jonas happy :-)
ok, so to keep Florian, Jonas and Sven (and others) happy,
I would like to tell you that I am deeply impressed by the great
work you all have done here.

I appreciate that you discuss those things with me, and I'd like
to discuss things a little further, because before investing time
here, I would like to know as much as possible about the environment.
The last few hours gave me much insight. Thank you for that.

As I understand it now, we have several levels of the compiler:

a) scanning and parsing the source code, which leads to a "tree
representation"
of the units, which IMO is a kind of in-storage-representation of the
source code
(and tables etc.), without much dependencies of the target, if any. I
don't think,
that informations about available registers etc. of the target machine
are necessary
at this level.

b) the "tree representation" is translated into a "linear assembly
list", which is
target specific; from previous posts it sounds as if there are generic
methods
which help with this, and those methods of course need information about
the
target platform - but there is no "intermediate language" at this stage
like
in the P-Code case. (I know of other compilers, namely the IBM commercial
ones, which translate in this stage for an abstract target machine which
has
an arbitrary number of registers, and the later "real" code generator
puts it
down to the real number, for example 16, and the missing registers are
simulated in storage). This needs to be examined more.

c) the "linear assembly list" is written to files, more or less without
changes

d) the files are assembled using an external assembler (in our case);
it must be callable from the FPC compiler. There exists an interface for
gas;
interfaces to other assemblers have to be built.

e) in the same way an external linker is used to link the units together.

Is this correct so far?

I'm not sure if and when I will find the time to jump really into this
doing real work,
but anyway: if we discuss these things now, it will remain in the
archives of the mailing
list and if others (like for example Paul Robinson) read this, they
don't have to
discuss it again. So IMO it's useful anyway.

Thank you, kind regards

Bernd
Mark Morgan Lloyd
2013-09-01 18:47:26 UTC
Permalink
Post by Bernd Oppolzer
Post by Mark Morgan Lloyd
At this point I'd throw in that one of the things the higher levels of
the compiler knows is the overall properties of the registers, i.e.
things like which ones are available for procedure parameters. This is
one of the things that the lower level has to specify, so the
lower-level units aren't there solely to do a macro-style substitution
converting the compiler's internal representation to a sequence of
assembler lines.
The corollary of this is that it's fairly common for a new target CPU
to necessitate higher-level changes, and these then have to be
propagated to all of the other targets. Which is why it's important to
keep people like Florian and Jonas happy :-)
Any comments from me below probably need checking by Florian et al.
Post by Bernd Oppolzer
ok, so to keep Florian, Jonas and Sven (and others) happy,
I would like to tell you that I am deeply impressed by the great
work you all have done here.
I appreciate that you discuss those things with me, and I'd like
to discuss things a little further, because before investing time
here, I would like to know as much as possible about the environment.
The last few hours gave me much insight. Thank you for that.
a) scanning and parsing the source code, which leads to a "tree
representation"
of the units, which IMO is a kind of in-storage-representation of the
source code
(and tables etc.), without much dependencies of the target, if any. I
don't think,
that informations about available registers etc. of the target machine
are necessary
at this level.
As a general point, I think it's worth considering that FPC (as a
particular example of a modern compiler) supports a lot of things that
weren't in e.g. classic Pascal (as described by Jensen & Wirth) which
are implemented at this level rather than necessarily requiring fancy
opcode sequences at the lowest levels. Dynamic strings and arrays,
reference counting associated with both of these, generics, custom
definitions of opcodes, and so on.
Post by Bernd Oppolzer
b) the "tree representation" is translated into a "linear assembly
list", which is
target specific; from previous posts it sounds as if there are generic
methods
which help with this, and those methods of course need information about
the
target platform - but there is no "intermediate language" at this stage
like
in the P-Code case. (I know of other compilers, namely the IBM commercial
ones, which translate in this stage for an abstract target machine which
has
an arbitrary number of registers, and the later "real" code generator
puts it
down to the real number, for example 16, and the missing registers are
simulated in storage). This needs to be examined more.
I think that register-based compilers are now fairly standard. It's
still worth considering other things that the compiler might usefully
understand, e.g. the sliding window mechanism used by SPARC (and
possibly by the Itanium).
Post by Bernd Oppolzer
c) the "linear assembly list" is written to files, more or less without
changes
d) the files are assembled using an external assembler (in our case);
it must be callable from the FPC compiler. There exists an interface for
gas;
interfaces to other assemblers have to be built.
It's possible to get the compiler to output script files to control
assembly and linkage, so the compiler and assembler don't /have/ to be
on the same system. Also there are already interfaces other than gas,
but I think a significant thing is that gas is much more widely
implemented than the alternatives.
Post by Bernd Oppolzer
e) in the same way an external linker is used to link the units together.
Is this correct so far?
I'm not sure if and when I will find the time to jump really into this
doing real work,
but anyway: if we discuss these things now, it will remain in the
archives of the mailing
list and if others (like for example Paul Robinson) read this, they
don't have to
discuss it again. So IMO it's useful anyway.
I noticed comment elsewhere this morning from Paul saying that he'd been
working on a new compiler, without any further detail.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
Sven Barth
2013-09-01 20:09:08 UTC
Permalink
Post by Bernd Oppolzer
a) scanning and parsing the source code, which leads to a "tree
representation"
of the units, which IMO is a kind of in-storage-representation of the
source code
(and tables etc.), without much dependencies of the target, if any. I
don't think,
that informations about available registers etc. of the target machine
are necessary
at this level.
Correct. There are only very few exceptions, e.g. on m68k-amiga you can
use the "location" keyword to specify a register location for a
parameter or the result value (whereby of course you need to know which
registers are available).
Post by Bernd Oppolzer
b) the "tree representation" is translated into a "linear assembly
list", which is
target specific; from previous posts it sounds as if there are generic
methods
which help with this, and those methods of course need information about
the
target platform - but there is no "intermediate language" at this stage
like
in the P-Code case. (I know of other compilers, namely the IBM commercial
ones, which translate in this stage for an abstract target machine which
has
an arbitrary number of registers, and the later "real" code generator
puts it
down to the real number, for example 16, and the missing registers are
simulated in storage). This needs to be examined more.
It looks like this: all nodes are classes that descend from the tnode
class. E.g. for adding two values (ordinals or floats) there is the
taddnode which descends from tbinarynode which in turn descends from
tnode. This taddnode does the general type checking and often also
determines whether the result will be passed in a register, as memory
reference or whatever. Then there is the tcgaddnode which descends from
taddnode and is either able to provide a platform independant
implementation (not the case for taddnode) which uses the general
interface of the code generator or at least provides a few platform
specific helpers and splits the implementation up a bit (in this case
tcgaddnode provides methods "second_addfloat", "second_cmpfloat",
"second_addordinal", etc. which can be implemented by the platform
specific node which is (in case of ARM) called tarmaddnode and is a
descendant of tcgaddnode.
This node usually uses the platform specific taicpu to generate the
abstract assembler instructions.

This seems a bit abstract at first, but with time you get a feeling for
it...
Post by Bernd Oppolzer
c) the "linear assembly list" is written to files, more or less without
changes
Whereby the specified assembler writer is used.
Post by Bernd Oppolzer
d) the files are assembled using an external assembler (in our case);
it must be callable from the FPC compiler. There exists an interface for
gas;
interfaces to other assemblers have to be built.
gas is only one example. There are interfaces for a few other assemblers
as well (mostly on i386, but e.g. on ppc-macos (Classic Mac OS) we use
the Apple specific assembler tool)
Post by Bernd Oppolzer
e) in the same way an external linker is used to link the units together.
Correct.

Regards,
Sven
Sven Barth
2013-09-01 14:29:09 UTC
Permalink
On 01.09.2013 15:49, Bernd Oppolzer wrote:

Most points were already answered by Jonas, so I'll only tackle the
Post by Bernd Oppolzer
BTW: is it possible to print the linear assembler representation - the
outcome of step 2 -
which in my understanding is NOT target-specific - and compare it - for
example - to
the assembler code generated for MIPS? I believe that from that mapping
there could
indeed be learned something ...
There is currently no way to write this linear assembler representation
as is. Only through the assembler writers they can be written.

The entries of that linear presentation are mostly located inside the
aasmtai.pas and $CPUTARGET/aasmcpu.pas units (each tai is one entry
inside the linear list) with one of the more important ones being taicpu
as this is the one which encapsulates the platform specific CPU
instructions.
Post by Bernd Oppolzer
Post by Sven Barth
For a new port it is advisable to not implement internal assemblers
and linkers, because that means more work, but to use existing tools
for the given platform. And here like for the RTL assembler/linker
interfaces for the GNU assembler and linker already exist which
simplify a port. Later on additional assembler/linker interfaces for
other assemblers/linkers can be added or even an internal assembler
and/or linker can be written.
I, too, would try to rely on existing assemblers and linkers,
but I have the feeling that HLASM (or free Assemblers supporting
the z-Arch vocabulary) is a better choice than gas. I believe that
there are some people out there waiting for FPC being available
on z-Arch, and they are not all Unix-aware, but they read and
understand the classical z-machinery well and would like it,
if FPC was available there, without additional prerequisites
(from their point of view).
The point is still that one should choose Linux as a first target OS for
that new platform and thus the assembler must be supported there as
well. As GNU as is normally always there where there is a Linux this
would mean that for the first steps of the implementation no new
assembler writer needs to be implemented (though this isn't necessarily
rocket science). If HLASM supports assembling code for Linux you can of
course implement that as first.
My suggestion is however to only implement as few things as possible as
the first step which means "only" implementing the code generator and no
new RTL, no assembler writer and no linker interface. Later on once
Linux works good enough (not everything needs to be implemented for
this!) additional assembler writers or target OSes can be added. You are
working inside an unfamiliar environment (the compiler's code) and thus
you should it make yourself as easy as possible. Otherwise you might
experience frustration when doing this port...

Regards,
Sven
Marco van de Voort
2013-09-01 13:51:45 UTC
Permalink
Post by Mark Morgan Lloyd
One of the links I posted earlier demonstrates that GCC is generally
tailored to suit the prevalent assembler on each operating system
(Linux, MUSIC/SP, CMS or whatever). That obviously doesn't imply that
the macro packages that serious assembler programmers would expect are
available.
That cuts both ways. The 16-bit target uses nasm, and it is really noticable
that it operates much slower than (G)AS.

Speed is also the main reason for an internal assembler. It is yet noticably
faster compared to AS. (and then I'm not even discussing old styled (non
section based) smartlinking, when it is several magnitudes)
Loading...