Discussion:
Even more problems with lcc-win
(too old to reply)
jacobnavia
2017-03-24 10:25:14 UTC
Permalink
Hi

I am starting to get results with my ARM64 port. lcc-win is compiling
big programs now, without any problems...

The acid test is to compile the compiler with itself. To do that, I need
to parse the includes furnished by the linux system/gcc. There is a
confusing mixture of include files in
/usr/include,
/usr/include/aarch64/bits,
/usr/local/include/aarch64,

and a LONG list of "places" where includes with the same name are stored.

Which one should I use? How could I figure out which ones are used by gcc?

To avoid these problems, I decided to move the includes I am using from
windows to linux, but that would mean that I port my standard C library
into linux, not an easy task, specially because the source code is 100%
adapted to lcc and not easily used by other compilers, and that could
introduce a new set of bugs!

Besides, this produces binary incompatibilities between the generated
code and what the functions in the library expect, even more trouble...

So, I have no other choice than to parse include files full of "gccisms"
and to make that work, and that starts with figuring out WHICH include
files should I use, hence my question in the subject line.

The standard is silent on this subject, not even specifying a common
command that all compilers could support, telling the user how the
search paths are built.
BartC
2017-03-24 11:04:09 UTC
Permalink
Post by jacobnavia
Hi
I am starting to get results with my ARM64 port. lcc-win is compiling
big programs now, without any problems...
The acid test is to compile the compiler with itself. To do that, I need
to parse the includes furnished by the linux system/gcc. There is a
confusing mixture of include files in
/usr/include,
/usr/include/aarch64/bits,
/usr/local/include/aarch64,
and a LONG list of "places" where includes with the same name are stored.
Which one should I use? How could I figure out which ones are used by gcc?
To avoid these problems, I decided to move the includes I am using from
windows to linux, but that would mean that I port my standard C library
into linux, not an easy task, specially because the source code is 100%
adapted to lcc and not easily used by other compilers, and that could
introduce a new set of bugs!
Besides, this produces binary incompatibilities between the generated
code and what the functions in the library expect, even more trouble...
So, I have no other choice than to parse include files full of "gccisms"
and to make that work, and that starts with figuring out WHICH include
files should I use, hence my question in the subject line.
I found exactly the same set of problems [with my experimental C
compiler that was supposed to be OS-agnostic]. Programs that also
compile under Window with other than gcc are good candidates. Programs
that expect to compile under Linux tend to be full of gcc dependencies.

As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full of
'__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)

I would also, if trying to compile for Linux, need to provide a private
set of headers. But if a program uses /and depends/ on
__builtin_whatever, that won't help. FWIW, initial experiments used gcc
headers in these paths:

"/usr/lib/gcc/x86_64-linux-gnu/6/include/"
"/usr/lib/gcc/x86_64-linux-gnu/bits/"
"/usr/include/"

However, gcc headers seem to be full of special features such as inline
assembly code, as well being huge. Programs using standard headers from
here also seem to involve twice as many include files as the equivalents
under Windows. Linux system headers seem to be a huge ugly mess.
Post by jacobnavia
The standard is silent on this subject, not even specifying a common
command that all compilers could support, telling the user how the
search paths are built.
I think this is up to the implementation. Except that on Linux, I don't
know to what extent implementations can do their own thing, if the
interface to the OS - via some C libraries - is defined through headers
in /usr/include.
--
Bartc
Ben Bacarisse
2017-03-24 11:44:40 UTC
Permalink
BartC <***@freeuk.com> writes:
<snip>
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full
of '__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
Why do you think __builtin_expect is a Linux thing?

<snip>
--
Ben.
jacobnavia
2017-03-24 11:59:24 UTC
Permalink
Post by Ben Bacarisse
<snip>
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full
of '__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
Why do you think __builtin_expect is a Linux thing?
<snip>
He doesn't. He said: "__builtin_expect is a gcc thing". Just read what
you quoted!
Ben Bacarisse
2017-03-24 14:06:23 UTC
Permalink
Post by jacobnavia
Post by Ben Bacarisse
<snip>
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full
of '__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
Why do you think __builtin_expect is a Linux thing?
<snip>
He doesn't. He said: "__builtin_expect is a gcc thing". Just read what
you quoted!
Yes, I too can read, but the paragraph was about the the fact that gcc
and Linux can't be separated. A parenthetical remark after such a
statement would be expected to offer support, evidence or examples
related to it. It reads "a gcc thing" that is example of how you can't
"prise gcc and Linux apart".
--
Ben.
David Brown
2017-03-24 12:01:00 UTC
Permalink
Post by Ben Bacarisse
<snip>
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full
of '__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
Why do you think __builtin_expect is a Linux thing?
He doesn't know what it does, but deep down he knows it is probably
useful, documented in the obvious place (compiler builtins for gcc are
documented in the gcc manual, in the chapter called "builtins"), and
easily found using Google. That applies to pretty much everything he
has trouble with in Linux. Therefore, __builtin_expect must be a
"linuxy thing".
BartC
2017-03-24 12:39:09 UTC
Permalink
Post by David Brown
Post by Ben Bacarisse
<snip>
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full
of '__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
Why do you think __builtin_expect is a Linux thing?
He doesn't know what it does, but deep down he knows it is probably
useful,
No, really it isn't. Not when you're trying to write code where you let
the compiler get on with /its/ job and you get on with yours.

__builtin_expect is something to do with you telling the compiler
whether you think some particular expression is more likely to be true
than false, or something along those lines.

You don't want stuff like that cluttering up your code. If you're going
to those lengths, then you might as well write assembly code.

In this case, all I needed to know was whether a dummy macro for
__builtin_expect should be empty, or return some value, to order to get
around a stalled compile.

But typical gcc and/or Linux code (that is, code using Linux headers) is
just chock-full of this stuff. A Linux system header includes another,
and that includes another, and that includes... it just goes on and on.

(Why do I care? Because when duplicating those headers under Windows for
easier development, I have to scour the Linux system for all those
dependent headers! Apparently in Linux they're not allowed to be all in
one place.)

documented in the obvious place (compiler builtins for gcc are
Post by David Brown
documented in the gcc manual, in the chapter called "builtins"), and
easily found using Google. That applies to pretty much everything he
has trouble with in Linux. Therefore, __builtin_expect must be a
"linuxy thing".
Linux system headers are more complicated that the equivalent under
Windows, even using gcc. Fact. A program containing 20 standard includes
will end up fetching 20-30 headers on Windows, apart from gcc, where 40
headers are required (and much bigger ones too).

The same program using Linux headers will fetch 80 includes.
--
bartc
Scott Lurndal
2017-03-24 13:17:35 UTC
Permalink
Post by BartC
Post by David Brown
He doesn't know what it does, but deep down he knows it is probably
useful,
No, really it isn't. Not when you're trying to write code where you let
the compiler get on with /its/ job and you get on with yours.
__builtin_expect is something to do with you telling the compiler
whether you think some particular expression is more likely to be true
than false, or something along those lines.
You don't want stuff like that cluttering up your code. If you're going
to those lengths, then you might as well write assembly code.
Pure nonsense. Having used this extensively in multiple codebases
(operating systems, hypervisors, functional models) encompassing
hundreds of thousands of lines of C-code, the idea that all that code
would be better off written in Assembler is just plain nonsense.
Post by BartC
In this case, all I needed to know was whether a dummy macro for
__builtin_expect should be empty, or return some value, to order to get
around a stalled compile.
But typical gcc and/or Linux code (that is, code using Linux headers) is
just chock-full of this stuff. A Linux system header includes another,
and that includes another, and that includes... it just goes on and on.
(Why do I care? Because when duplicating those headers under Windows for
easier development, I have to scour the Linux system for all those
dependent headers! Apparently in Linux they're not allowed to be all in
one place.)
It doesn't make _sense_ to put them all in one place. Some are provided
by the compiler (portland, intel or gcc); some are provided by the kernel and
some are provided by the distribution, with a fair division.
Post by BartC
Linux system headers are more complicated that the equivalent under
Linux is a far more capable system than Windows, one which supports
literally dozens of dissimilar architectures.
BartC
2017-03-24 13:38:04 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Linux system headers are more complicated that the equivalent under
Linux is a far more capable system than Windows, one which supports
literally dozens of dissimilar architectures.
I'm talking about standard headers for C, a language.

Besides, a particular set of headers, if it does depends on a particular
architecture, should only concern itself with one.

If I compile hello.c using lccwin's headers, I get this trace:

INCLUDE stdio.h FROM hello.c 2
INCLUDE safelib.h FROM c:/lcc64/include/stdio.h 158
INCLUDE _syslist.h FROM c:/lcc64/include/stdio.h 221

If I use a set of headers from Ubuntu, I get this:

INCLUDE stdio.h FROM hello.c 2
INCLUDE features.h FROM c:/bcc/include/stdio.h 28
INCLUDE stdc-predef.h FROM c:/bcc/include/features.h 343
INCLUDE sys/cdefs.h FROM c:/bcc/include/features.h 365
INCLUDE bits/wordsize.h FROM c:/bcc/include/sys/cdefs.h 416
INCLUDE gnu/stubs.h FROM c:/bcc/include/features.h 389
INCLUDE gnu/stubs-32.h FROM c:/bcc/include/gnu/stubs.h 8
INCLUDE stddef.h FROM c:/bcc/include/stdio.h 34
INCLUDE bits/types.h FROM c:/bcc/include/stdio.h 36
INCLUDE features.h FROM bits/types.h 27
INCLUDE bits/wordsize.h FROM bits/types.h 28
INCLUDE bits/typesizes.h FROM bits/types.h 122
INCLUDE libio.h FROM c:/bcc/include/stdio.h 75
INCLUDE _G_config.h FROM c:/bcc/include/libio.h 32
INCLUDE bits/types.h FROM c:/bcc/include/_G_config.h 10
INCLUDE stddef.h FROM c:/bcc/include/_G_config.h 16
INCLUDE wchar.h FROM c:/bcc/include/_G_config.h 21
INCLUDE stddef.h FROM c:/bcc/include/wchar.h 52
INCLUDE stdarg.h FROM c:/bcc/include/libio.h 50
INCLUDE stdarg.h FROM c:/bcc/include/stdio.h 84
INCLUDE bits/stdio_lim.h FROM c:/bcc/include/stdio.h 167
INCLUDE bits/sys_errlist.h FROM c:/bcc/include/stdio.h 856

A little bit different. While my own early headers give:

INCLUDE stdio.h FROM hello.c 2
INCLUDE stddef.h FROM ./headers/stdio.h 7

My first thought here was that it's pity it's showing two headers rather
than just stdio.h; what can I do to simplify it? (stddef.h defines size_t)
--
bartc
Scott Lurndal
2017-03-24 13:50:33 UTC
Permalink
Post by BartC
Post by Scott Lurndal
Post by BartC
Linux system headers are more complicated that the equivalent under
Linux is a far more capable system than Windows, one which supports
literally dozens of dissimilar architectures.
I'm talking about standard headers for C, a language.
For which the programmer uses the standard names and the C language
itself makes no requirements on the implementation other than to
correctly respond to '#include <ctype.h>'. An implementation is
free to store the headers in any fashion it pleases - an implementation
may not even have a file system.
Post by BartC
Besides, a particular set of headers, if it does depends on a particular
architecture, should only concern itself with one.
INCLUDE stdio.h FROM hello.c 2
INCLUDE safelib.h FROM c:/lcc64/include/stdio.h 158
INCLUDE _syslist.h FROM c:/lcc64/include/stdio.h 221
If I use a set of headers from Ubuntu, I get this
INCLUDE stdio.h FROM hello.c 2
INCLUDE features.h FROM c:/bcc/include/stdio.h 28
INCLUDE stdc-predef.h FROM c:/bcc/include/features.h 343
INCLUDE sys/cdefs.h FROM c:/bcc/include/features.h 365
INCLUDE bits/wordsize.h FROM c:/bcc/include/sys/cdefs.h 416
INCLUDE gnu/stubs.h FROM c:/bcc/include/features.h 389
INCLUDE gnu/stubs-32.h FROM c:/bcc/include/gnu/stubs.h 8
INCLUDE stddef.h FROM c:/bcc/include/stdio.h 34
INCLUDE bits/types.h FROM c:/bcc/include/stdio.h 36
INCLUDE features.h FROM bits/types.h 27
INCLUDE bits/wordsize.h FROM bits/types.h 28
INCLUDE bits/typesizes.h FROM bits/types.h 122
INCLUDE libio.h FROM c:/bcc/include/stdio.h 75
INCLUDE _G_config.h FROM c:/bcc/include/libio.h 32
INCLUDE bits/types.h FROM c:/bcc/include/_G_config.h 10
INCLUDE stddef.h FROM c:/bcc/include/_G_config.h 16
INCLUDE wchar.h FROM c:/bcc/include/_G_config.h 21
INCLUDE stddef.h FROM c:/bcc/include/wchar.h 52
INCLUDE stdarg.h FROM c:/bcc/include/libio.h 50
INCLUDE stdarg.h FROM c:/bcc/include/stdio.h 84
INCLUDE bits/stdio_lim.h FROM c:/bcc/include/stdio.h 167
INCLUDE bits/sys_errlist.h FROM c:/bcc/include/stdio.h 856
A little bit different.
Why would any C programmer care?

You omitted the justification for the linux arrangement from the quoted article above.
BartC
2017-03-24 14:06:52 UTC
Permalink
Post by Scott Lurndal
Post by BartC
A little bit different.
Why would any C programmer care?
I think you've just explained it all. After all what does it matter if
these system headers - on which so many things depend - are a huge,
ugly, bloated unmaintainable mess?
Post by Scott Lurndal
You omitted the justification for the linux arrangement from the quoted article above.
The test program was Hello World, which only needs the declaration for
printf. It could also be written like this:

int printf(char*, ...);

int main (void) {printf("Hello, World!\n");}

Amazingly, barring typos above, it will still work, and without needing
to pre-process 5000 lines of gobbledygook.

But feel free to explain what all that extra stuff adds to this program.

A shared library somewhere exports printf(), and a C source file wants
to use it. In between should be a header giving the function signature
of printf. That is all that is needed.
--
bartc
Scott Lurndal
2017-03-24 14:20:22 UTC
Permalink
Post by BartC
Post by Scott Lurndal
Post by BartC
A little bit different.
Why would any C programmer care?
I think you've just explained it all. After all what does it matter if
these system headers - on which so many things depend - are a huge,
ugly, bloated unmaintainable mess?
Again, tell me why a C programmer would care?
David Brown
2017-03-24 14:49:59 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Post by Scott Lurndal
Post by BartC
A little bit different.
Why would any C programmer care?
I think you've just explained it all. After all what does it matter if
these system headers - on which so many things depend - are a huge,
ugly, bloated unmaintainable mess?
Again, tell me why a C programmer would care?
Also, why would you (Bart) think that the Linux system is unmaintainable?

In reality, a key motivation for all these headers is that it /is/
maintainable. It means that if the folks maintaining the MIPS port of
Linux change something in their headers, the C library automatically
gets the benefit of the change - without the C library people needing to
do anything, including test on a MIPS machine. And when the gcc folk
added support for C11 to their headers, they did not need to co-ordinate
with the makers of a dozen different C libraries or the maintainers of
three dozen Linux ports - they make the change in /their/ headers, and
anyone with that version of gcc gets the benefits.

With your little Windows compilers, if MS adds new features to their
API, you need to change your compilers' headers and libraries to match -
/that/ is the maintenance nightmare.
Gareth Owen
2017-03-24 16:35:13 UTC
Permalink
Post by David Brown
Post by Scott Lurndal
Post by BartC
Post by Scott Lurndal
Post by BartC
A little bit different.
Why would any C programmer care?
I think you've just explained it all. After all what does it matter if
these system headers - on which so many things depend - are a huge,
ugly, bloated unmaintainable mess?
Again, tell me why a C programmer would care?
Also, why would you (Bart) think that the Linux system is unmaintainable?
This is a man who claims he is writing a C compiler, and then asks
questions about aliasing in the Standard so trivial that even I can
answer them.

The working of his mind is not something that mortals should ponder.
BartC
2017-03-24 14:51:28 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Post by Scott Lurndal
Post by BartC
A little bit different.
Why would any C programmer care?
I think you've just explained it all. After all what does it matter if
these system headers - on which so many things depend - are a huge,
ugly, bloated unmaintainable mess?
Again, tell me why a C programmer would care?
Well, they might care when things go wrong and they have to delve into
the implementation headers to find out why.

The might care if they know that their code compiles on a wing and a
prayer, dependent on some very hairy sets of declarations.

They might care if they inadvertently use gcc or Linux dependencies, if
they want their code to compile on anything else (there's a dirent.h or
unistd.h for example).

Or if they actually make use of, in user code, of some of those extra
definitions in you find in Linux headers that don't exist elsewhere.
(And if they don't make use of them, then what's the point.)

Or they might just care if they have any sort of pride in their language
and how it's used. A lot of what I've seen is just abuse of it.

(I've compiled one of my projects, a 32Kloc interpreter when expressed
as C source. Compiled with gcc/Windows headers, an extra 20Kloc of
headers are processed.

With lccwin headers, it's about 1.5Kloc. With my own early headers, it
only needs about 150 lines of declarations. I haven't tried it on Linux
headers, as it stalls at some point.

Looking at it, the problem is in stdargs.h, which is not standard Linux,
but from its gcc. I can't compile gcc's stdargs.h because it uses
gcc-specific features. I need to provide my own stdargs.h at some point.
However, there is no reason to do it now because my project DOESN'T USE
STDARG.H! But it must get included from another header.)
--
bartc
Noob
2017-03-24 22:36:31 UTC
Permalink
Post by BartC
They might care if they inadvertently use gcc or Linux dependencies, if
they want their code to compile on anything else (there's a dirent.h or
unistd.h for example).
gcc -std=c11 -Wall -pedantic -pedantic-errors
would catch most inadvertent uses of gcc extensions.

As for "Linux dependencies", you are confusing POSIX/SUS, glibc,
and the kernel proper, which is standard fare for you.
David Brown
2017-03-24 14:43:17 UTC
Permalink
Post by BartC
Post by David Brown
Post by Ben Bacarisse
<snip>
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full
of '__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
Why do you think __builtin_expect is a Linux thing?
He doesn't know what it does, but deep down he knows it is probably
useful,
No, really it isn't. Not when you're trying to write code where you let
the compiler get on with /its/ job and you get on with yours.
__builtin_expect is something to do with you telling the compiler
whether you think some particular expression is more likely to be true
than false, or something along those lines.
First, read the manual - then you will have some basis for commenting:

<https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect>

__builtin_expect(x, y) has the value "x", but tells the compiler that
"x" is probably equal to "y". A typical use is the Linux macros:

#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)

Thus you can write:

if (likely(n > 3)) ...

and know that the compiler will use branch prediction instructions or
optimisation on the assumption that "n > 3" should be the fastest path.
Post by BartC
You don't want stuff like that cluttering up your code. If you're going
to those lengths, then you might as well write assembly code.
No, you use features like that specifically so that you /don't/ need to
write assembly code - but you can still guide the compiler with the aim
of getting the fastest possible code. When you are writing code that
should be as fast as you can reasonably make it, and you want it to be
portable to a wide range of processors (some of which have static branch
prediction bits in their jump instructions), then extensions like this
can be useful.

You don't "clutter up" your code with such features, but you may use
them occasionally.
Post by BartC
In this case, all I needed to know was whether a dummy macro for
__builtin_expect should be empty, or return some value, to order to get
around a stalled compile.
And the manual page will give you that information:

#define __builtin_expect(x, y) (x)
Post by BartC
But typical gcc and/or Linux code (that is, code using Linux headers) is
just chock-full of this stuff. A Linux system header includes another,
and that includes another, and that includes... it just goes on and on.
(Why do I care? Because when duplicating those headers under Windows for
easier development, I have to scour the Linux system for all those
dependent headers! Apparently in Linux they're not allowed to be all in
one place.)
That is because Linux - like most operating systems except Windows - has
system headers to let people write programs for their system. And these
system headers are /not/ the same as a compiler's standard headers, and
the compiler's standard headers are not the same as the C library's
headers. In the Windows world, these are usually thrown together in one
lump because the OS does not have standard headers, and because
compilers for Windows usually don't have a choice of libraries.
Post by BartC
documented in the obvious place (compiler builtins for gcc are
Post by David Brown
documented in the gcc manual, in the chapter called "builtins"), and
easily found using Google. That applies to pretty much everything he
has trouble with in Linux. Therefore, __builtin_expect must be a
"linuxy thing".
Linux system headers are more complicated that the equivalent under
Windows, even using gcc. Fact. A program containing 20 standard includes
will end up fetching 20-30 headers on Windows, apart from gcc, where 40
headers are required (and much bigger ones too).
The same program using Linux headers will fetch 80 includes.
So what?

When I use a header, I can't think of any reason to care how many
sub-headers are pulled in. The C standards don't define any
restrictions here. It makes no difference to /my/ code. System headers
are not where I look for information on functions or types, so I don't
need to refer to them.

If I were making a compiler, I would not care either. If I wanted my
compiler to work with implementation-specific headers for a different
compiler, then of course I would need to support any extensions used by
them. But that's all you need to do - the compiler you are making
presumably already supports include files that include other files.
BartC
2017-03-24 17:57:14 UTC
Permalink
Post by David Brown
Post by BartC
__builtin_expect is something to do with you telling the compiler
whether you think some particular expression is more likely to be true
than false, or something along those lines.
<https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect>
__builtin_expect(x, y) has the value "x", but tells the compiler that
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
if (likely(n > 3)) ...
and know that the compiler will use branch prediction instructions or
optimisation on the assumption that "n > 3" should be the fastest path.
Suppose you get it wrong?

Anyway the real problem is writing things like __builtin_expect(....)
all over the place (and dozens of other examples), and then trying to
compile it on something that doesn't understand __builtin_expect. I
don't want to have implement both C and half of gcc, even if the latter
justinvolves having to look all this stuff up and devising a workaround.

(I don't know the provenance of this particular set of sources files,
and whether they were specifically configured for gcc, or whether they
could be configured for anything else. But anything that needs to be
configured for specific compilers would ring alarm bells if it will not
also work with a generic C compiler.)
Post by David Brown
#define __builtin_expect(x, y) (x)
It's not standard C, so I'm not interested. I just wanted to compile
some code not get involved in gcc yet again.
Post by David Brown
That is because Linux - like most operating systems except Windows - has
system headers to let people write programs for their system. And these
system headers are /not/ the same as a compiler's standard headers, and
the compiler's standard headers are not the same as the C library's
headers. In the Windows world, these are usually thrown together in one
lump because the OS does not have standard headers, and because
compilers for Windows usually don't have a choice of libraries.
I'm not talking about OS system headers, but the standard headers from
the C standard.

For Windows, I plan to use the C library provided in msvcrt.dll. For
Linux, I think the equivalent is libc.so.6.

The standard headers need to provide function signatures for functions
exported from those libraries. Plus define those other entities
mentioned in the C standard (NULL for example).

There doesn't seem to me to be much to it.

Now, to go beyond the C standard library, in Windows a big give-away is
the presence of windows.h, which I don't intend to duplicate at the
minute (I can borrow one from an existing compiler, but not gcc because
it's full of advanced gcc-specific features and is also up to 20 times
the size of the smallest).

In Linux, it's a bit tricker - perhaps headers such as dirent.h. The
dividing line between C functions and OS functions isn't as clear, and
people will probably use headers from either without even thinking about it.

This is why I'm only bothering with C sources either known to compile on
Windows with /any/ compiler, or that are designed to, and do, compile on
both OSes.

Supporting dirent.h etc can come later, however debugging what goes
wrong is hard if the only other compiler I can compare with happens to
be gcc.
Post by David Brown
Post by BartC
The same program using Linux headers will fetch 80 includes.
So what?
So 800 would be no problem? What about 8000? We are still talking about
STANDARD C. The library isn't that big, but you're not curious as to why
Linux and/or gcc make such a meal of it?
--
bartc
Scott Lurndal
2017-03-24 18:13:52 UTC
Permalink
Post by BartC
Post by David Brown
Post by BartC
__builtin_expect is something to do with you telling the compiler
whether you think some particular expression is more likely to be true
than false, or something along those lines.
<https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect>
__builtin_expect(x, y) has the value "x", but tells the compiler that
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
if (likely(n > 3)) ...
and know that the compiler will use branch prediction instructions or
optimisation on the assumption that "n > 3" should be the fastest path.
Suppose you get it wrong?
The code generated by the compiler may not be quite as
efficient as it would otherwise be.
Post by BartC
Anyway the real problem is writing things like __builtin_expect(....)
all over the place (and dozens of other examples), and then trying to
compile it on something that doesn't understand __builtin_expect.
Well, the documentation for __builtin_expect notes what to
do in this case. Which is up to the programmer, not the compiler
developer.

In other words, don't worry about it, if someone uses your compile
to compile code with GNU extensions, that's their problem.
Post by BartC
(I don't know the provenance of this particular set of sources files,
and whether they were specifically configured for gcc, or whether they
could be configured for anything else. But anything that needs to be
configured for specific compilers would ring alarm bells if it will not
also work with a generic C compiler.)
If the term 'gcc' or 'gnu' is in the path to the header file,
you can be sure that it is specific to the gcc compiler. Note
that linux supports gcc, clang, portland group compilers and the
intel compiler (all at the same time, on the same system) which
requires the non-gcc header files to be independent of the compilation
suite used.
Post by BartC
I'm not talking about OS system headers, but the standard headers from
the C standard.
Which are provided by the compiler vendor.
Post by BartC
For Windows, I plan to use the C library provided in msvcrt.dll. For
Linux, I think the equivalent is libc.so.6.
There is far more in libc than simply the C standard-defined run-time
functions.
Post by BartC
The standard headers need to provide function signatures for functions
exported from those libraries.
No, they need to provide function signature for functions
defined by the standards. Which library the implementation is located
is irrelevent.
Post by BartC
Plus define those other entities
mentioned in the C standard (NULL for example).
There doesn't seem to me to be much to it.
Get back to us when you can install, build and use your compiler on
dozens of wildly different operating systems.
Post by BartC
In Linux, it's a bit tricker - perhaps headers such as dirent.h. The
dividing line between C functions and OS functions isn't as clear,
It's quite clear. Posix incorporates by reference the C standard.
Post by BartC
So 800 would be no problem? What about 8000? We are still talking about
STANDARD C. The library isn't that big, but you're not curious as to why
Linux and/or gcc make such a meal of it?
We've told you several times _why_ the header files are arranged
as they are on unix/linux systems. You're not listening.

Very few _useful_ programs can be built using only STANDARD C
functionality.
Richard Heathfield
2017-03-24 18:32:50 UTC
Permalink
Post by Scott Lurndal
Very few _useful_ programs can be built using only STANDARD C
functionality.
If you mean strictly conforming programs, I'm inclined to agree.

If, however, you mean clc-conforming programs, I would argue that there
are infinitely many useful programs that can be built using only
standard C functionality.

If you allow extensions as well, you get infinitely many /more/.

I do very occasionally make use of extensions, but it's surprisingly
rare that it's necessary.

(On the increasingly rare occasions that I write Windows GUI programs, I
decreasingly often use C, since C++ Builder is so much more convenient.)
--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
Keith Thompson
2017-03-24 19:11:55 UTC
Permalink
[...]
Post by Scott Lurndal
Post by BartC
I'm not talking about OS system headers, but the standard headers from
the C standard.
Which are provided by the compiler vendor.
Or by a separate library vendor.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2017-03-24 19:56:21 UTC
Permalink
Post by Scott Lurndal
Post by BartC
(I don't know the provenance of this particular set of sources files,
and whether they were specifically configured for gcc, or whether they
could be configured for anything else. But anything that needs to be
configured for specific compilers would ring alarm bells if it will not
also work with a generic C compiler.)
If the term 'gcc' or 'gnu' is in the path to the header file,
you can be sure that it is specific to the gcc compiler. Note
that linux supports gcc, clang, portland group compilers and the
intel compiler (all at the same time, on the same system) which
requires the non-gcc header files to be independent of the compilation
suite used.
Post by BartC
I'm not talking about OS system headers, but the standard headers from
the C standard.
Which are provided by the compiler vendor.
stdio.h exists in /usr/include/. There is also a version in
/usr/include/x64_64-linux-gnu/bits/

So the OS is providing stdio.h, and gcc tells me that is the one it uses
(the bits directory isn't directly specified in the include search path).

In the case of clang for Windows, it doesn't provide any headers at all,
but expects to use the ones from MSVC (and previously used the ones for
gcc/mingw).
Post by Scott Lurndal
Post by BartC
There doesn't seem to me to be much to it.
Get back to us when you can install, build and use your compiler on
dozens of wildly different operating systems.
MSVC seems to be rather heavyweight too (see below); how many OSes and
platforms does that work with?

Anyway I'm not sure why this is relevant: does the definition of
printf() differ wildly across OSes? And if there are platform-specific
lines, does every copy of every Linux have to contain code for every
conceivable platform? Sounds like a 'configure' script is needed - or is
this /after/ running it?!

Take this program:

#include <assert.h>
//#include <complex.h>
#include <ctype.h>
#include <errno.h>
#include <fenv.h>
#include <float.h>
#include <inttypes.h>
//#include <iso646.h>
#include <limits.h>
#include <locale.h>
#include <math.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <wchar.h>
#include <wctype.h>

Preprocessing this requires processing these source lines and involves
these total header files, using the following C implementations to
provide the files:

lccwin 3.3K 24 unique include files
Pelles C 3.5K 22
DMC 4.7K 22
Tiny C 11K 31
gcc/Win 27K 44
gcc/Linux 28K 89 Most headers from /usr/include
MSVC2008 47K 24 Excludes stdint/inttypes/fenv/stdbool

(This old MSVC doesn't support C99 headers.)

What does this show? Probably nothing, except that some implementations
have more and/or bigger files than others. There's no indication that
those extra lines are needed or serve any benefit.

Both MS and gcc are worked on by lots of people, presumably none of whom
is responsible for taking stuff out, and simplifying.
Post by Scott Lurndal
We've told you several times _why_ the header files are arranged
as they are on unix/linux systems. You're not listening.
Very few _useful_ programs can be built using only STANDARD C
functionality.
Yeah, only programs that take input from keyboard and/or file, and write
output to screen and/or disk. Which includes C compilers.
--
bartc
Keith Thompson
2017-03-24 20:48:20 UTC
Permalink
BartC <***@freeuk.com> writes:
[...]
Post by BartC
stdio.h exists in /usr/include/. There is also a version in
/usr/include/x64_64-linux-gnu/bits/
It's /usr/include/x86_64-linux-gnu/bits/,
not /usr/include/x64_64-linux-gnu/bits/

But that second one is just another file that happens to have the same
name. On my system, it includes the following lines:

#ifndef _STDIO_H
# error "Never include <bits/stdio.h> directly; use <stdio.h> instead."
#endif

It's an *internal* file, part of the implementation, not something that
any C programmer needs to care about. It doesn't provide, for example,
a declaration of printf, just some low-level declarations that are used
by the real <stdio.h>. Apparently the designers found it useful to use
parallel file names (that directory contains several other files whose
names match those of standard headers).
Post by BartC
So the OS is providing stdio.h, and gcc tells me that is the one it uses
(the bits directory isn't directly specified in the include search path).
Actually it's provided by the libc6-dev package, at least on my Ubuntu
system -- but I think that package is preinstalled, so you could say
it's provided by the OS.

[...]
Post by BartC
Anyway I'm not sure why this is relevant: does the definition of
printf() differ wildly across OSes?
(I'm sure you mean the *declaration* of printf.) It doesn't vary a
whole lot, though some systems will probably have to make the "restrict"
optional somehow to allow for pre-C99 compilers.

But the definition of type FILE probably does vary a great deal.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2017-03-24 21:17:47 UTC
Permalink
Post by Keith Thompson
(I'm sure you mean the *declaration* of printf.) It doesn't vary a
whole lot, though some systems will probably have to make the "restrict"
optional somehow to allow for pre-C99 compilers.
But the definition of type FILE probably does vary a great deal.
I tried to have a look, but after several minutes I couldn't even find
what FILE was defined as! They seem intent on making this impenetrable.

I worked out that FILE was an alias for 'struct _IO_FILE', and that that
was probably defined in libio.h.

Oh, hang on, I think I've just found it. _IO_FILE occurs quite a lot in
libio.h. And for good measure, it also defines:

typedef struct _IO_FILE _IO_FILE;

just to keep you on your toes. Here's the version in my fledging stdio.h:

typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;

Clearly I have a lot to learn about making it hard to unravel!

But the main requirement is that this struct takes up 48 bytes in 64-bit
Windows (because stdin/out/err are defined as entries in an array of
such structs). So I know now that this won't work under Linux as it is.
I'll have to see how to make that possible without /my/ headers
suffering the same fate of drowning in a sea of macros and conditionals.
--
bartc
jacobnavia
2017-03-24 21:40:59 UTC
Permalink
Post by BartC
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Clearly I have a lot to learn about making it hard to unravel!
I have another definition, that's even better:
typedef struct __FILE FILE;

And "struct __FILE" is defined NOWHERE.

Isn't that great?

You just can't access any of the fields of that structure, it is an
opaque structure. This
1) Keeps current practice. I have never seen anyone using any of those
fields directly.
2) It is concise and not at all verbose. Just ONE LINE!
BartC
2017-03-24 21:56:35 UTC
Permalink
Post by jacobnavia
Post by BartC
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Clearly I have a lot to learn about making it hard to unravel!
typedef struct __FILE FILE;
And "struct __FILE" is defined NOWHERE.
Isn't that great?
You just can't access any of the fields of that structure, it is an
opaque structure. This
1) Keeps current practice. I have never seen anyone using any of those
fields directly.
2) It is concise and not at all verbose. Just ONE LINE!
Well, the contents of the struct don't matter, just the size. So they
could be replaced by char[48] in this case. Except that, if the same
declaration is used for a 32-bit target, the size will then be wrong.

The size is needed, as far as I can make out, in order to properly deal
with the _iob array of FILE exported by msvcrt.dll, or by the exported
_iob_func() function which returns a pointer to that array. This array
gives you stdin, stdout and stderr. (There wasn't any other way of
getting those, last time I asked.)

Of course I could just have the opaque FILE type, and hard-code the _iob
array stride as 48 bytes or whatever it would be for 32-bits. It's
hardly going to change any time soon.
--
bartc
jacobnavia
2017-03-25 00:14:20 UTC
Permalink
Post by BartC
Well, the contents of the struct don't matter, just the size. So they
could be replaced by char[48] in this case. Except that, if the same
declaration is used for a 32-bit target, the size will then be wrong.
No. We never use the structure directly we use a FILE *. The size of the
structure does NOT matter.

You see the difference?

A file POINTER, pointing to an unknown structure that is returned by
fopen, and should be disposed of with fclose. The function fread() uses
that pointer, but nowhere are any fields of that pointer specified. That
is a slightly different thing as

char[48];

All code that follows those rules is portable to any C compiler. And I
have never seen any code that doesn't follow those rules.

The implementation can add any required fields at any time, without
affecting old code that continues to run.

New code can use new functionalities that could be added to fopen.
For instance opening URLs with it, and abstracting away the network layer.

FILE *fopen( URL, "rn|");

This would start reading the URL into disk, determined by
NET_BUFSIZ in stdio.h. The optional vertical bar could mean non-locking.
A NULL would mean that the host or the network is down. An open
connection would have a non NULL result, that can be used by other
functions.

fread would have the same parameters, as fclose. An easy way to abstract
the network into C.

the "system" function could use URLs too, by starting a command in a
remote machine.

Would this be more useful than a new network layer?

What was the fate of the proposed network layer some years ago by the C
committee?
BartC
2017-03-25 01:19:05 UTC
Permalink
Post by jacobnavia
Post by BartC
Well, the contents of the struct don't matter, just the size. So they
could be replaced by char[48] in this case. Except that, if the same
declaration is used for a 32-bit target, the size will then be wrong.
No. We never use the structure directly we use a FILE *. The size of the
structure does NOT matter.
You see the difference?
The problem is here:

extern FILE _iob[];

#define stdin (&_iob[0])
#define stdout (&_iob[1])
#define stderr (&_iob[2])

The array calculation requires the size of FILE to be correct. How do we
get the right size of FILE? By creating a struct with the right pattern
of members (of size 44 bytes but rounded to 48 - on Win64).

However after that has been determined, then the struct can be
discarded, an opaque type used for FILE, and a different mechanism used
to get the right offsets from _iob.

But since the FILE struct already exists, it might as well stay.

(The above is for Windows; in Linux, stdout for example is defined as a
pointer to FILE (or struct _IO_FILE). Hopefully these reside in the C
library and have already been initialised, and don't require
initialisation code to be be invoked by my program, as I have no idea
how to do that.

But, I plan to access stdout etc via dynamic linking. It's not clear
whether the resulting values even belong to my process. A test using
dynamic access to _iob worked in Windows; a similar test under Linux
failed. This might be a bit of a stumbling block...)
--
bartc
BartC
2017-03-25 01:30:08 UTC
Permalink
Post by BartC
(The above is for Windows; in Linux, stdout for example is defined as a
pointer to FILE (or struct _IO_FILE). Hopefully these reside in the C
library and have already been initialised, and don't require
initialisation code to be be invoked by my program, as I have no idea
how to do that.
But, I plan to access stdout etc via dynamic linking. It's not clear
whether the resulting values even belong to my process. A test using
dynamic access to _iob worked in Windows; a similar test under Linux
failed. This might be a bit of a stumbling block...)
I've got it to work, for the time being. The value returned by
dlsym(lib,"stdout") is a pointer to stdout, not stdout itself.

It seems to work either with fputs directly (belonging to the current
process), or a fputs obtained via dlsym().
s***@casperkitty.com
2017-03-25 19:46:34 UTC
Permalink
Post by BartC
extern FILE _iob[];
#define stdin (&_iob[0])
#define stdout (&_iob[1])
#define stderr (&_iob[2])
If stdio.h defines stdin etc. in those ways, it will need to know the size
of the objects in question. Likewise, if an implementation were to say

inline int getc(FILE * restrict f)
{
if (f->simpleGetPtr)
{
f->simpleGetsLeft--;
return *f->simpleGetPtr;
}
else
return __do_getc(f);
}

it would need to have the complete definition of FILE available to fetch
those fields. Headers like <stdio.h> are expected to have aspects which are
customized to fit the underlying library implementation. They're not
required to do so, but performance can be improved when they are [e.g. a
compiler might be able to optimize a loop which calls getc() multiple times
so that it can process up to f->simpleGetsLeft characters at once].
BartC
2017-03-25 20:10:46 UTC
Permalink
Post by s***@casperkitty.com
Post by BartC
extern FILE _iob[];
#define stdin (&_iob[0])
#define stdout (&_iob[1])
#define stderr (&_iob[2])
If stdio.h defines stdin etc. in those ways, it will need to know the size
of the objects in question.
It's determined by the library. For MSVCRT.DLL, the only way I've found
is to somehow obtain the address of the _iob array which is apparently a
set of FILE structs.

Alternatively the implementation could decide to do things entirely
differently, such as map stdout to some handle from Win32. Then all C
library functions need to be revised to be based on Win32 functions.

However, I don't want to rewrite the C library, as the existing ones
seem perfectly adequate. So far.
Post by s***@casperkitty.com
it would need to have the complete definition of FILE available to fetch
those fields. Headers like <stdio.h> are expected to have aspects which are
customized to fit the underlying library implementation. They're not
required to do so, but performance can be improved when they are [e.g. a
compiler might be able to optimize a loop which calls getc() multiple times
so that it can process up to f->simpleGetsLeft characters at once].
I wouldn't rely on an implementation having a fast getc(); I would just
load the entire file in one go. (And ignore the nonsense about files
possibly being too big to load; my RAM is sufficient to hold 100,000
copies of the typical size of file I use.)

Anyway, how much of an overhead would a function call actually be: what
MB/second throughputs can people get with a macro getc(), and how does
that compare with a function getc()?
--
bartc
Gordon Burditt
2017-03-27 08:34:48 UTC
Permalink
Post by BartC
extern FILE _iob[];
There's no reason to put this internal stuff in <stdio.h>, except
perhaps historical reasons.
Post by BartC
But since the FILE struct already exists, it might as well stay.
(The above is for Windows; in Linux, stdout for example is defined as a
pointer to FILE (or struct _IO_FILE). Hopefully these reside in the C
library and have already been initialised, and don't require
initialisation code to be be invoked by my program, as I have no idea
how to do that.
C doesn't like to define symbols like stdin to have linkage directly
since it shouldn't collide with a user's names if he DOESN'T include
<stdio.h> in a particular source file. Thus we have the defines below,
and reserved variable names __stdinp, __stdoutp, and __stderrp instead
of stdin, stdout, and stderr directly:


In stdio.h:

/* You don't need a definition of FILE with a correct size here. */
/* Use an opaque type */
#define stdin __stdinp
#define stdout __stdoutp
#define stderr __stderrp
extern FILE * __stdinp;
extern FILE * __stdoutp;
extern FILE * __stderrp;

In fopen.c or some other source code file for the C library:

static FILE _iob[__MAX_FILES];

FILE * __stdinp = &_iob[0];
FILE * __stdoutp = &_iob[1];
FILE * __stderrp = &_iob[2];

You don't need any initialization code in your program for this to work.

_iob is static here so you can't reference it from user code, as it should be.
It should be renamed to something nobody can spell, like _supercalifragilisticexpalidociouss.
Note: since the trend is for __MAX_FILES is to be large, like 1000, and
not necessarily a fixed limit, perhaps this should be dynamically allocated.
FreeBSD breaks up the array: the first 3 are statically allocated (and
initialized - I didn't show that here), and the rest are dynamic.
Post by BartC
But, I plan to access stdout etc via dynamic linking.
Windows has some wierdness that sometimes requires variables be put in a DLL or
not be put in a DLL.
Linux doesn't.
Post by BartC
It's not clear whether the resulting values even belong to my process.
If they do not, you can't get to them. Period. The C library (excluding
the kernel code run because of system calls) runs in user process space.
Post by BartC
A test using
dynamic access to _iob worked in Windows; a similar test under Linux
failed. This might be a bit of a stumbling block...)
Was this with the header file that referenced &_iob[1] and had the wrong
size for a FILE? That would explain it. It's possible that _iob was
made static or renamed. You shouldn't be referencing it anyway other
than using definitions in <stdio.h>.
BartC
2017-03-27 10:41:56 UTC
Permalink
Post by Gordon Burditt
Was this with the header file that referenced &_iob[1] and had the wrong
size for a FILE? That would explain it. It's possible that _iob was
made static or renamed. You shouldn't be referencing it anyway other
than using definitions in <stdio.h>.
I was under the impression that functions and things exported from
shared libraries could be accessed from any language using the right
ABI, not just the original language (eg. C).

Then it should be possible to do that without requiring a 'stdio.h' file
or involving a C compiler.

But it does require equivalent definitions in whatever language /is/
being used, and that means /knowing/ what is going on.

So, under Windows, using stdout (which might be needed to make use of
fprintf) requires that stdout is a pointer to the second element of some
array exported by the C shared library. Under Linux, from what I've
seen, stdout is itself directly exported from the C shared library.

(I'm guessing that the dlopen() function on the Linux library creates
new, private instances of the exported variables, and an internal
initialisation routine sets them up.)

(Did you yourself mention using this stuff from Perl, in another post? I
guess Perl wasn't using stdio.h.)
--
bartc
Gordon Burditt
2017-03-27 19:48:25 UTC
Permalink
Post by BartC
I was under the impression that functions and things exported from
shared libraries could be accessed from any language using the right
ABI, not just the original language (eg. C).
Then you need the appropriate headers for that language.

If the language doesn't come with the system, the language writer,
not the system writer, may be responsible for supplying header
files. Consider, for example, PL/1 or COBOL for UNIX. I see
no COBOL header files here.

A different language may have difficulties with some of the C names,
for example, not allowing _ in a variable name or function name.
FORTRAN had a different convention than the leading _; it may have
been a *trailing* _. Also, the C linker may not like some characters
in names that the other language uses.
Post by BartC
Then it should be possible to do that without requiring a 'stdio.h' file
or involving a C compiler.
If the language requires function declarations (and perhaps wrapper functions
if the calling conventions aren't compatible), then it needs something
equivalent to <stdio.h> for that language.
Post by BartC
But it does require equivalent definitions in whatever language /is/
being used, and that means /knowing/ what is going on.
Some languages provide means of automatically constructing <whatever
language> header files from C header files (e.g. Perl). Other
languages use wrapper functions and change the interface, for
example, you might pass a Perl or FORTRAN I/O handle instead of the
C stdout, and the wrapper takes care of the difference (A FORTRAN
or Perl I/O handle contains or can figure out a stdio stream to
use).

As I recall, at least one implementation of FORTRAN on UNIX had a
wrapper function for just about everything, including all the math
functions, a different one for every precision, and I/O because the
function/subroutine linkage wasn't the same. Also, FORTRAN I/O
wasn't just calling C functions with slightly different names. As
I recall, there were some funky rules about mixing C and FORTRAN
I/O in the same program, which I was trying to do. It seems they
each have their own set of buffers.
Post by BartC
So, under Windows, using stdout (which might be needed to make use of
fprintf) requires that stdout is a pointer to the second element of some
array exported by the C shared library. Under Linux, from what I've
seen, stdout is itself directly exported from the C shared library.
If you export a pointer to stdout, you don't need to export the fact
that the FILE struct is a member of an array, the name of the array
(if the array even has a name ... it may be allocated with malloc() ),
the size of a FILE struct, or the fact that it's the second element.

Perhaps the easiest way to export something like stdout is to compile
a 3-line file like this as C:

---------------
#include <stdio.h>

FILE *pointerToSTDOUT = stdout;
---------------
and link it with the other-language code (assuming that pointer
format and variable names are compatible) which uses "extern
pointerToSTDOUT or whatever the equivalent is in that language.
You don't have to worry about whether _ is a legal character in
your language, nor the contents of <stdio.h>. Just treat it as an
opaque pointer that you pass to fprintf() or whatever.
Post by BartC
(I'm guessing that the dlopen() function on the Linux library creates
new, private instances of the exported variables, and an internal
initialisation routine sets them up.)
Assume you have a private library you can require in addition to the
C library. I think there is a way for "extern FILE *stdout;" in
the main program to match up with "FILE *stdout;" in the private library
as one variable if you use RTLD_GLOBAL. It also works this way if
you use shared libraries without dlopen().

Attempting to dlopen() the C library is a little bit pointless.
Guess where the dlopen() code is? You need to have it already
linked to call dlopen(). Well, it is possible to dlopen() the
executable itself and read its symbols, so it might have some use.

With RTLD_LOCAL, you can dlopen() multiple shared objects and each
may have functions and variables private to the library and with
duplicate names. In particular, each library may have _init and
_fini functions, which get automatically called and can do initializations
(like, say, C++ constructors).
Post by BartC
(Did you yourself mention using this stuff from Perl, in another post? I
guess Perl wasn't using stdio.h.)
I mentioned that the C code from Perl used code that accessed the
fields of a FILE that it wasn't really supposed to. Yes, the Perl
C code was using <stdio.h>. (I was building Perl, not using it,
watching the Configure script test whether this worked. And in one
case, it failed with spectacular error messages but went on to build
a working Perl). However, Perl does have a way of converting C
header files to Perl header files. There's a tree of Perl *.ph
files, one for each C include file, hidden under the site_perl directory.
It even translates all the #if's!

In C++, you can (and have to) tell it to use C rather than C++
calling conventions. extern "C" { ... }
s***@casperkitty.com
2017-03-27 20:31:35 UTC
Permalink
Post by Gordon Burditt
If you export a pointer to stdout, you don't need to export the fact
that the FILE struct is a member of an array, the name of the array
(if the array even has a name ... it may be allocated with malloc() ),
the size of a FILE struct, or the fact that it's the second element.
Perhaps the easiest way to export something like stdout is to compile
---------------
#include <stdio.h>
FILE *pointerToSTDOUT = stdout;
---------------
and link it with the other-language code (assuming that pointer
format and variable names are compatible) which uses "extern
pointerToSTDOUT or whatever the equivalent is in that language.
That approach may cause an extra layer of indirection compared with what
would be necessary if e.g. stdout reported the address of an element within
a an exported FILE[]. Some C compilers provide means of exporting symbols
with particular addresses, but there's no consistent syntax. It would have
been helpful if the Standard had specifies a syntax for exporting a symbol
with a pointer-constant address, subject to Implementation-Defined
constraints. Such constraints might allow any C pointer-constant expression,
might not allow any, or might e.g. allow only pointer constants which do not
depend upon any imported symbols.

If an implementation were to accept:

uint32_t volatile MoeLarryCurly[3];
uint32_t volatile Moe : MoeLarryCurly[0]; // Syntax with no other meaning
uint32_t volatile Larry : MoeLarryCurly[1];
uint32_t volatile Curly : MoeLarryCurly[2];

then the behavior of accessing "Larry" would be equivalent to that of
accessing MoeLarryCurly[1]. Systems would be required to reject such
code if the linker wouldn't be able to achieve the required semantics,
but enough linkers can handle such things that it shouldn't be necessary
to use assembly language to accomplish them.
bartc
2017-03-27 21:22:57 UTC
Permalink
Post by Gordon Burditt
Post by BartC
I was under the impression that functions and things exported from
shared libraries could be accessed from any language using the right
ABI, not just the original language (eg. C).
Then you need the appropriate headers for that language.
Yes, that's what I'm trying to create.
Post by Gordon Burditt
If the language doesn't come with the system, the language writer,
not the system writer, may be responsible for supplying header
files.
What do you mean by the language writer? If you mean the implementor,
then he will need the information necessary in order to be able to write
the headers (or the equivalent in the language).

I'm the implementor. I'm using a variety of sources including existing C
headers in order to glean enough info to do the job, but solutions
/requiring/ you to use stdio.h and /assuming/ you're going to use an
existing C compiler aren't much good.

I might, for example, want to be able to use C's file functions from
assembly code.
Post by Gordon Burditt
Post by BartC
Then it should be possible to do that without requiring a 'stdio.h' file
or involving a C compiler.
If the language requires function declarations (and perhaps wrapper functions
if the calling conventions aren't compatible), then it needs something
equivalent to <stdio.h> for that language.
Suppose the problem is to execute the equivalent of this:

fprintf(stdout, "hello");

but to do it from assembly as I suggested above. Your only tools are an
assembler and a linker that can manage the connection between your ASM
code and a shared library of either msvcrt.dll or libc.so.6. How do you
do it?

'fprintf' is just an address, so it easy. What about 'stdout'? Where do
you find the information that tells you where to get 'stdout'? (Windows
and Linux manage this differently.)
Post by Gordon Burditt
Perhaps the easiest way to export something like stdout is to compile
---------------
#include <stdio.h>
FILE *pointerToSTDOUT = stdout;
---------------
and link it with the other-language code (assuming that pointer
format and variable names are compatible) which uses "extern
pointerToSTDOUT or whatever the equivalent is in that language.
You don't have to worry about whether _ is a legal character in
your language, nor the contents of <stdio.h>. Just treat it as an
opaque pointer that you pass to fprintf() or whatever.
I'm writing a C compiler. I can't make it half-dependent on another C
compiler for it to work! How did the authors of that other compiler
manage it - did they depend on yet another compiler, so that there is a
chain of dozen compilers? (Like 5 buses following a sixth because only
the one in front knows the way!)

(The only concession I can make, for people who want to compile from
source, is to require an existing C compiler to bootstrap, then it can
be set aside.)
Post by Gordon Burditt
Post by BartC
(I'm guessing that the dlopen() function on the Linux library creates
new, private instances of the exported variables, and an internal
initialisation routine sets them up.)
Assume you have a private library you can require in addition to the
C library. I think there is a way for "extern FILE *stdout;" in
the main program to match up with "FILE *stdout;" in the private library
as one variable if you use RTLD_GLOBAL. It also works this way if
you use shared libraries without dlopen().
Attempting to dlopen() the C library is a little bit pointless.
Guess where the dlopen() code is? You need to have it already
linked to call dlopen().
The point of opening it is to get a handle to pass to dlsym(). But also,
when I get into it, to ensure that I'm using a different invocation of
the library from the program that is being executed.

(One use of this is when the primary executable is an interpreter of a
language which can itself directly call functions in a library such as
C. Any local state within its use of the library needs to be kept
separate from that of the host.

Another, which I'm doing now, is a program which converts C source to
native code to be directly executed. It will need its own private fixups
to an external C library, separate from its host. The host may also need
to stay resident while different versions of the target program are run.)

. Well, it is possible to dlopen() the
Post by Gordon Burditt
executable itself and read its symbols, so it might have some use.
Is there a dl function to enumerate all the symbols rather than look for
a specific one? (Sometimes you won't know which library a symbol might
be in.)
Post by Gordon Burditt
Post by BartC
(Did you yourself mention using this stuff from Perl, in another post? I
guess Perl wasn't using stdio.h.)
I mentioned that the C code from Perl used code that accessed the
fields of a FILE that it wasn't really supposed to. Yes, the Perl
C code was using <stdio.h>. (I was building Perl, not using it,
watching the Configure script test whether this worked. And in one
case, it failed with spectacular error messages but went on to build
a working Perl). However, Perl does have a way of converting C
header files to Perl header files. There's a tree of Perl *.ph
files, one for each C include file, hidden under the site_perl directory.
It even translates all the #if's!
I wonder what it converts them to, unless .ph files are not very
different from .h files.

(With my own such language, converting .h files is not easy, as there is
no equivalent in the target language! It works completely differently.)
--
bart



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
s***@casperkitty.com
2017-03-27 22:05:23 UTC
Permalink
Post by bartc
I'm writing a C compiler. I can't make it half-dependent on another C
compiler for it to work! How did the authors of that other compiler
manage it - did they depend on yet another compiler, so that there is a
chain of dozen compilers? (Like 5 buses following a sixth because only
the one in front knows the way!)
There are two approaches a C implementation may take to handling things like
the FILE type:

1. Define FILE however it likes, and then implement functions like fopen,
fwrite, etc. which use those fields however they see fit, and in turn
call some OS routines never receive a FILE, but will likely be passed
one or more members from it. The OS routine would have no reason to
know or care how the C implementation defined FILE.

2. Define FILE in a fashion consistent with a file-control structure used
by an underlying operating system.

If one wants to use a C implementation to do the equivalent of
fprintf(stdout, "hello"); one should write a compilation unit like:

#include <stdarg.h>
#include <stdio.h>
int do_fprintf(FILE *f, char const *restrict fmt, ...)
{
va_list vp;
va_start(va, fmt);
vfprintf(f, fmt, va);
va_end(vp);
}
FILE *get_stdout(void)
{
return stdout;
}

which will export two functions which can be invoked from within assembly
language. C implementations may omit library code which is not used within
a C file, or may select from among different implementations of things like
fprintf() based upon the ways they see it invoked. If a C implementation
sees that none of the printf functions receive any format specifier more
complicated that %d or %u, it may link in an implementation that omits other
specifiers. If a machine-code function tries to call printf with other kinds
of format specifiers, the linked implementation might not work. If code
chains to vfprintf from an exported function, however, the implementation
would know that it would need to include an "fprintf" that could handle
anything.
bartc
2017-03-27 22:22:19 UTC
Permalink
Post by s***@casperkitty.com
Post by bartc
I'm writing a C compiler. I can't make it half-dependent on another C
compiler for it to work! How did the authors of that other compiler
manage it - did they depend on yet another compiler, so that there is a
chain of dozen compilers? (Like 5 buses following a sixth because only
the one in front knows the way!)
There are two approaches a C implementation may take to handling things like
1. Define FILE however it likes, and then implement functions like fopen,
fwrite, etc. which use those fields however they see fit, and in turn
call some OS routines never receive a FILE, but will likely be passed
one or more members from it. The OS routine would have no reason to
know or care how the C implementation defined FILE.
2. Define FILE in a fashion consistent with a file-control structure used
by an underlying operating system.
Yes, one way is to write an entirely new C library, then it doesn't
matter how it's done. But there are some problems:

(1) It's a huge amount of work

(2) It will likely be tied to one OS, so I might need one for each target OS

(3) I don't know if there are going to be clashes when my code using its
own library interacts with functions in external shared libraries using
a more standard one.

(4) I don't want to do it and hadn't planned to. There are perfectly
good existing libraries, which already exist on my target OSes, and this
is what I'd intended to use. (Actually my interpreted stuff already
makes use of them.)
Post by s***@casperkitty.com
If one wants to use a C implementation to do the equivalent of
#include <stdarg.h>
#include <stdio.h>
int do_fprintf(FILE *f, char const *restrict fmt, ...)
{
va_list vp;
va_start(va, fmt);
vfprintf(f, fmt, va);
va_end(vp);
}
FILE *get_stdout(void)
{
return stdout;
}
which will export two functions which can be invoked from within assembly
language.
Imagine if the C language didn't exist. But you still have a binary API
that works remarkably similarly to the current C library. How could you
call such functions?

And in fact the fprintf example wasn't concerned with variadic
arguments, but getting access to the stdout used by the shared library.

This is surprisingly difficult to obtain (outside of just using C).
Sometimes you can get a file handle which does the same job, but it's
not stdout. So if you passed it to a function which /did/ have access to
the real thing, then f==stdout would be false.
Post by s***@casperkitty.com
C implementations may omit library code which is not used within
a C file, or may select from among different implementations of things like
fprintf() based upon the ways they see it invoked. If a C implementation
sees that none of the printf functions receive any format specifier more
complicated that %d or %u, it may link in an implementation that omits other
specifiers. If a machine-code function tries to call printf with other kinds
of format specifiers, the linked implementation might not work. If code
chains to vfprintf from an exported function, however, the implementation
would know that it would need to include an "fprintf" that could handle
anything.
Well, hopefully I won't be implementing any printf functions!
--
bartc

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
jacobnavia
2017-03-27 22:35:34 UTC
Permalink
Post by bartc
Yes, one way is to write an entirely new C library, then it doesn't
(1) It's a huge amount of work
Yes. I did that, and it is true. it is a HUGE amount of work.
Post by bartc
(2) It will likely be tied to one OS, so I might need one for each target OS
Yes. You have to port it to each new OS that you support.
Post by bartc
(3) I don't know if there are going to be clashes when my code using its
own library interacts with functions in external shared libraries using
a more standard one.
The new library will be incompatible with any existing library.
Post by bartc
(4) I don't want to do it and hadn't planned to. There are perfectly
good existing libraries, which already exist on my target OSes, and this
is what I'd intended to use. (Actually my interpreted stuff already
makes use of them.)
Use CRTDLL.DLL then, and libc in linux. I am doing that under linux for
the time being. Since the implementation of the library is open source I
can use it and hack it at will.
Post by bartc
Well, hopefully I won't be implementing any printf functions!
I did implement printf since there is no compiler around that will
understand 480 bit floats, or 128 bit integers, etc.
s***@casperkitty.com
2017-03-27 23:00:09 UTC
Permalink
Post by bartc
Yes, one way is to write an entirely new C library, then it doesn't
(1) It's a huge amount of work
(2) It will likely be tied to one OS, so I might need one for each target OS
That would depend upon how similar the target platforms are, and upon what
trade-offs you make between speed and portability.
Post by bartc
(3) I don't know if there are going to be clashes when my code using its
own library interacts with functions in external shared libraries using
a more standard one.
If you want to exchange pointers to FILE with existing code that uses some
library that defines such a type, then use that library.
Post by bartc
(4) I don't want to do it and hadn't planned to. There are perfectly
good existing libraries, which already exist on my target OSes, and this
is what I'd intended to use. (Actually my interpreted stuff already
makes use of them.)
Post by s***@casperkitty.com
If one wants to use a C implementation to do the equivalent of
#include <stdarg.h>
#include <stdio.h>
int do_fprintf(FILE *f, char const *restrict fmt, ...)
{
va_list vp;
va_start(va, fmt);
vfprintf(f, fmt, va);
va_end(vp);
}
FILE *get_stdout(void)
{
return stdout;
}
which will export two functions which can be invoked from within assembly
language.
Imagine if the C language didn't exist. But you still have a binary API
that works remarkably similarly to the current C library. How could you
call such functions?
In whatever function is defined by the API.
Post by bartc
And in fact the fprintf example wasn't concerned with variadic
arguments, but getting access to the stdout used by the shared library.
This is surprisingly difficult to obtain (outside of just using C).
Sometimes you can get a file handle which does the same job, but it's
not stdout. So if you passed it to a function which /did/ have access to
the real thing, then f==stdout would be false.
The difficulty is that there is no defined way in C for code to export any
symbol which isn't the direct result of a compile-time-constant-size
allocation, but some implementations may define stdout in a fashion that
yields a link-time constant address which has some particular relation to
other such addresses, and the only way to do that is via macro.
Post by bartc
Post by s***@casperkitty.com
C implementations may omit library code which is not used within
a C file, or may select from among different implementations of things like
fprintf() based upon the ways they see it invoked. If a C implementation
sees that none of the printf functions receive any format specifier more
complicated that %d or %u, it may link in an implementation that omits other
specifiers. If a machine-code function tries to call printf with other kinds
of format specifiers, the linked implementation might not work. If code
chains to vfprintf from an exported function, however, the implementation
would know that it would need to include an "fprintf" that could handle
anything.
Well, hopefully I won't be implementing any printf functions!
The same principle applies with other functions as well.
bartc
2017-03-28 12:24:28 UTC
Permalink
Post by s***@casperkitty.com
Post by bartc
Yes, one way is to write an entirely new C library, then it doesn't
(1) It's a huge amount of work
(2) It will likely be tied to one OS, so I might need one for each target OS
That would depend upon how similar the target platforms are, and upon what
trade-offs you make between speed and portability.
One will be Win32, the other will be Linux.

On Win32, the file API is easy to recognise (CreateFile() etc, all a
pain to use, but only needs to be done once).

On Linux, I actually don't know. Is it POSIX and functions such as
open() instead of fopen()? The first example I looked at started with
#include <stdio.h> which presumably contains fopen as well.

Another source says the "C Standard Library ('libc') provides ... open,
read and write." Really, they are C standard functions?

This is why I keep saying it's difficult to split up what's part of
Linux, and what is part of C (sometimes I say it is Linux and gcc that
are hard to split; certainly Linux, C and gcc seem very cosy).

As for the relationship between Linux and Unix and whether any Linux
stuff will work on the latter, I've no idea.

Anyway, the Windows and Linux platforms are quite different. I would
expect to write libraries in other areas that need to work on both, but
not for things already covered by the standard C library.
Post by s***@casperkitty.com
Post by bartc
This is surprisingly difficult to obtain (outside of just using C).
Sometimes you can get a file handle which does the same job, but it's
not stdout. So if you passed it to a function which /did/ have access to
the real thing, then f==stdout would be false.
The difficulty is that there is no defined way in C for code to export any
symbol which isn't the direct result of a compile-time-constant-size
allocation, but some implementations may define stdout in a fashion that
yields a link-time constant address which has some particular relation to
other such addresses, and the only way to do that is via macro.
CLOCKS_PER_SECOND is another tricky one. I just assumed it was 1000 on
Windows and 1000000 on Linux, but a recent test seemed to indicate it
was 1000 on Linux too (despite bits/time.h suggesting it should be 1000000).

It's a ridiculous thing to be a stumbling block. But then it always is.

(In my own languages, the equivalent of STDIN and STDOUT is just the
constant 0. Thats a little bit easier than all this malarky necessary
with C.)
--
Bartc

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
David Brown
2017-03-28 13:20:28 UTC
Permalink
Post by bartc
Post by s***@casperkitty.com
Post by bartc
Yes, one way is to write an entirely new C library, then it doesn't
(1) It's a huge amount of work
(2) It will likely be tied to one OS, so I might need one for each target OS
That would depend upon how similar the target platforms are, and upon what
trade-offs you make between speed and portability.
One will be Win32, the other will be Linux.
On Win32, the file API is easy to recognise (CreateFile() etc, all a
pain to use, but only needs to be done once).
On Linux, I actually don't know. Is it POSIX and functions such as
open() instead of fopen()? The first example I looked at started with
#include <stdio.h> which presumably contains fopen as well.
Another source says the "C Standard Library ('libc') provides ... open,
read and write." Really, they are C standard functions?
This is why I keep saying it's difficult to split up what's part of
Linux, and what is part of C (sometimes I say it is Linux and gcc that
are hard to split; certainly Linux, C and gcc seem very cosy).
As for the relationship between Linux and Unix and whether any Linux
stuff will work on the latter, I've no idea.
Anyway, the Windows and Linux platforms are quite different. I would
expect to write libraries in other areas that need to work on both, but
not for things already covered by the standard C library.
Post by s***@casperkitty.com
Post by bartc
This is surprisingly difficult to obtain (outside of just using C).
Sometimes you can get a file handle which does the same job, but it's
not stdout. So if you passed it to a function which /did/ have access to
the real thing, then f==stdout would be false.
The difficulty is that there is no defined way in C for code to export any
symbol which isn't the direct result of a compile-time-constant-size
allocation, but some implementations may define stdout in a fashion that
yields a link-time constant address which has some particular relation to
other such addresses, and the only way to do that is via macro.
CLOCKS_PER_SECOND is another tricky one. I just assumed it was 1000 on
Windows and 1000000 on Linux, but a recent test seemed to indicate it
was 1000 on Linux too (despite bits/time.h suggesting it should be 1000000).
CLOCKS_PER_SECOND is 1000000 on Linux - it is a requirement of POSIX.
If your "recent test" indicated that it was 1000 on Linux, then your
recent test was incorrect - or you misunderstood what CLOCKS_PER_SECOND is.

It is common on Linux to have a "jiffies" rate of 1000 - that's the rate
of the main system timers for multi-tasking. But the value can differ
from system to system - sometimes a server build of the kernel will have
lower values for lower overhead, while a system designed for multimedia
or real-time work might have a kernel with a higher jiffie rate. And
that is precisely why CLOCKS_PER_SECOND is made constant for all POSIX
systems, so that you can get the same timing information regardless of
the actual rate of the internal timers.
Post by bartc
It's a ridiculous thing to be a stumbling block. But then it always is.
Of course it is a ridiculous thing to stumble on - so why are you
stumbling on it?
Post by bartc
(In my own languages, the equivalent of STDIN and STDOUT is just the
constant 0. Thats a little bit easier than all this malarky necessary
with C.)
BartC
2017-03-28 15:50:36 UTC
Permalink
Post by David Brown
Post by bartc
CLOCKS_PER_SECOND is another tricky one. I just assumed it was 1000 on
Windows and 1000000 on Linux, but a recent test seemed to indicate it
was 1000 on Linux too (despite bits/time.h suggesting it should be 1000000).
CLOCKS_PER_SECOND is 1000000 on Linux - it is a requirement of POSIX.
If your "recent test" indicated that it was 1000 on Linux, then your
recent test was incorrect - or you misunderstood what CLOCKS_PER_SECOND is.
Probably, I'll have to get back to it. (This was the reason I need that
couple of seconds delay a couple of weeks back, so that I could see
whether my os_clock routine (a wrapper around clock()) returned ~2000
(msec) rather than 2 or 2000000.)
Post by David Brown
It is common on Linux to have a "jiffies" rate of 1000 - that's the rate
of the main system timers for multi-tasking. But the value can differ
from system to system - sometimes a server build of the kernel will have
lower values for lower overhead, while a system designed for multimedia
or real-time work might have a kernel with a higher jiffie rate. And
that is precisely why CLOCKS_PER_SECOND is made constant for all POSIX
systems, so that you can get the same timing information regardless of
the actual rate of the internal timers.
Post by bartc
It's a ridiculous thing to be a stumbling block. But then it always is.
Of course it is a ridiculous thing to stumble on - so why are you
stumbling on it?
What is the purpose of defining CLOCKS_PER_SEC (not SECOND according to
Keith) if it always 1000000?

Well, if you start coding in slightly more adventurous waters where you
don't have access to time.h (because of using a different language, or
one language compiles and executes another, neither of which is C, etc)
then these little details /can/ be obstacles if you're trying to use an
exported clock() function.

--
bart
Keith Thompson
2017-03-28 16:28:05 UTC
Permalink
BartC <***@freeuk.com> writes:
[...]
Post by BartC
What is the purpose of defining CLOCKS_PER_SEC (not SECOND according to
Keith) if it always 1000000?
You know, rather than qualifying that with "according to Keith",
you could always check the C standard, which is exactly what I did.
You have a copy of n1570.pdf and a way to view it, yes? You have a
tendency to guess when it's completely unnecessary. Just look it up.
(I'm not criticizing you for getting the name wrong originally;
mistakes happen, and I've made plenty of them.)

Who says it's *always* 1000000? It's always 1000000 on systems
that conform to POSIX (and apparently on Windows systems as well),
but the C standard covers *all* conforming C implementations, not
just that subset. And even if it were always 10000000, having a
name for it is much more legible and less error-prone than using
a magic number.

You probably didn't even notice the extra 0 I snuck into the
previous paragraph.
Post by BartC
Well, if you start coding in slightly more adventurous waters where you
don't have access to time.h (because of using a different language, or
one language compiles and executes another, neither of which is C, etc)
then these little details /can/ be obstacles if you're trying to use an
exported clock() function.
Yes, some of what you're trying to do is difficult.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2017-03-28 18:45:42 UTC
Permalink
Post by Keith Thompson
Well, if you start coding in slightly more adventurous waters ...
Yes, some of what you're trying to do is difficult.
Thanks. According to David Brown, /nothing/ is ever difficult, nothing
is too complicated, and no tool is too slow!

Here's another example of a minor problem I had to solve this morning,
although only tangentially connected to C.

Usually a program invoked from the command line picks up all the
Post by Keith Thompson
program A B C
The three parameters are A B C (ignore the issue with Linux where
parameters involving * and ? can expand to any number).

But the program I'm writing, call it X, takes /some/ of the parameters,
and creates another program Y which runs under X. Y thinks it's a
command line program and itself expects to read some command line
parameters. The issue is this, given:

program A B C D E F

which of those parameters belong X, and which should be left over to be
consumed by Y? (And in fact, Y can itself create a third program Z,
still running under the original process, which might need input too.)

I looked at a few options, such as restricting the parameters for X,
bracketing them, but for the time being settled for an explicit
delimiter, to keep things flexible for all programs:

program A B : D E F

A,B are passed to X; D,E,F are available for Y. (But things like > need
to be shared, and ":" is not available as input to either program.)

A really silly little problem but it needed to be solved. Like
CLOCKS_PER_SEC and stdout.
--
bartc
Keith Thompson
2017-03-28 19:06:28 UTC
Permalink
Post by BartC
Post by Keith Thompson
Well, if you start coding in slightly more adventurous waters ...
Yes, some of what you're trying to do is difficult.
Thanks. According to David Brown, /nothing/ is ever difficult, nothing
is too complicated, and no tool is too slow!
And there goes your credibility.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Richard Heathfield
2017-03-28 19:14:28 UTC
Permalink
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Well, if you start coding in slightly more adventurous waters ...
Yes, some of what you're trying to do is difficult.
Thanks. According to David Brown, /nothing/ is ever difficult, nothing
is too complicated, and no tool is too slow!
And there goes your credibility.
No, that went a long time ago.
--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
BartC
2017-03-28 19:24:18 UTC
Permalink
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Well, if you start coding in slightly more adventurous waters ...
Yes, some of what you're trying to do is difficult.
Thanks. According to David Brown, /nothing/ is ever difficult, nothing
is too complicated, and no tool is too slow!
And there goes your credibility.
But has he ever admitted any of that?
Post by Keith Thompson
A small 128 bit type with a few operations should be a great deal
simpler.

(I'm not sure what a /small/ 128-bit type is compared with a
regular-sized one!)
--
bart
David Brown
2017-03-28 19:33:29 UTC
Permalink
Post by BartC
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Well, if you start coding in slightly more adventurous waters ...
Yes, some of what you're trying to do is difficult.
Thanks. According to David Brown, /nothing/ is ever difficult, nothing
is too complicated, and no tool is too slow!
And there goes your credibility.
But has he ever admitted any of that?
Post by Keith Thompson
A small 128 bit type with a few operations should be a great deal
simpler.
(I'm not sure what a /small/ 128-bit type is compared with a
regular-sized one!)
I meant a small implementation of a 128-bit type, or a implementation of
a small number of operations for a 128-bit type. The request was for a
128-bit type with basic comparisons ==, < and >, along with the rather
useful information that you have int64_t and uint64_t (and therefore a
neat two's complement system with no odd sizes to worry about).

And yes, implementing this is a simple matter, and vastly easier than
creating a full 128-bit integer type integrated into a compiler (which
is what the comparison referred to).
s***@casperkitty.com
2017-03-28 20:10:34 UTC
Permalink
Post by David Brown
I meant a small implementation of a 128-bit type, or a implementation of
a small number of operations for a 128-bit type. The request was for a
128-bit type with basic comparisons ==, < and >, along with the rather
useful information that you have int64_t and uint64_t (and therefore a
neat two's complement system with no odd sizes to worry about).
I'm curious what purposes would be served by a 128-bit type outside of
manual chunking optimizations (which would require a means of creating
barriers which type-based aliasing analysis would not reach across),
capturing the results of 64x64->128 multiplications (on processors that
support them), or feeding dividends to 128/64->64r64 divisions (likewise).
The only other applications that would need numbers that size (e.g. RSA
cryptography) would need numbers a lot bigger.
David Brown
2017-03-28 20:15:25 UTC
Permalink
Post by s***@casperkitty.com
Post by David Brown
I meant a small implementation of a 128-bit type, or a implementation of
a small number of operations for a 128-bit type. The request was for a
128-bit type with basic comparisons ==, < and >, along with the rather
useful information that you have int64_t and uint64_t (and therefore a
neat two's complement system with no odd sizes to worry about).
I'm curious what purposes would be served by a 128-bit type outside of
manual chunking optimizations (which would require a means of creating
barriers which type-based aliasing analysis would not reach across),
capturing the results of 64x64->128 multiplications (on processors that
support them), or feeding dividends to 128/64->64r64 divisions (likewise).
The only other applications that would need numbers that size (e.g. RSA
cryptography) would need numbers a lot bigger.
There are not many situations where you need a 128 bit integer.
Cryptography is, as you say, a major use. And you get them as
intermediary types - if you have scaled integers stored as 64-bit types
and multiply them. You might also find them handy to store unique
identification numbers of some sort, or for other sorts of non-numerical
data.

But in this case, you would have to ask the OP of the 128-bit integer
thread.

David Brown
2017-03-28 19:29:44 UTC
Permalink
Post by BartC
Post by Keith Thompson
Well, if you start coding in slightly more adventurous waters ...
Yes, some of what you're trying to do is difficult.
Thanks. According to David Brown, /nothing/ is ever difficult, nothing
is too complicated, and no tool is too slow!
I am fairly sure I have said no such thing.

I have said that the speed of a tool is rarely the only important
feature - and that a tool can be fast enough for speed not to be an
issue, even if there are alternative tools that are faster. Maybe that
was confusing you?

And I certainly think a good proportion of the things you find
extraordinarily difficult, complex or obtuse are not difficult at all.
You have an extraordinary ability to wander into the fields, find little
molehills, built them up into mountains of your imagination, and then
stumble over them. And then you complain here as though it were somehow
the fault of people who use Linux and/or gcc, conspiring to make your
life difficult.

But there are plenty of other things in programming that /are/ hard.
That includes many things in cross-platform development - that's why for
serious work, most people use toolkits or libraries where another group
has done the work already.
BartC
2017-03-28 19:47:36 UTC
Permalink
Post by David Brown
Post by BartC
Thanks. According to David Brown, /nothing/ is ever difficult, nothing
is too complicated, and no tool is too slow!
And I certainly think a good proportion of the things you find
extraordinarily difficult, complex or obtuse are not difficult at all.
You have an extraordinary ability to wander into the fields, find little
molehills, built them up into mountains of your imagination,
Much of it is fact. Yesterday or the day before I posted some stats
about the size of windows.h on various compilers. You can choose to
compile on a 20Kloc version, or a 500Kloc one.

Is it possible that one or two of them might be a teeny bit bloated?
According to you, never! There will also be some functionality they are
adding, some extra value, that is not present in the smaller version
(even if the small version still manages to compile the same program).

Never mind that 75% of the lines in gcc/mingw's window.h (total 340K
lines in 160 files) are actually discarded, so are processed for nothing.

and then
Post by David Brown
stumble over them. And then you complain here as though it were somehow
the fault of people who use Linux and/or gcc, conspiring to make your
life difficult.
Yes, filling up header files full of complete junk, which, for people
like me who have to wade through the mess to find out how something
works, or why it doesn't, is a real problem.
Post by David Brown
But there are plenty of other things in programming that /are/ hard.
That includes many things in cross-platform development - that's why for
serious work, most people use toolkits or libraries where another group
has done the work already.
Yes, toolkits that only work on Linux! So much for cross-platform
development....
--
bartc
Keith Thompson
2017-03-28 15:07:34 UTC
Permalink
bartc <***@freeuk.com> writes:
[...]
Post by bartc
CLOCKS_PER_SECOND is another tricky one. I just assumed it was 1000 on
Windows and 1000000 on Linux, but a recent test seemed to indicate it
was 1000 on Linux too (despite bits/time.h suggesting it should be 1000000).
Do you mean CLOCKS_PER_SEC?

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jerry Stuckle
2017-03-27 23:01:44 UTC
Permalink
Post by Gordon Burditt
Post by BartC
I was under the impression that functions and things exported from
shared libraries could be accessed from any language using the right
ABI, not just the original language (eg. C).
Then you need the appropriate headers for that language.
If the language doesn't come with the system, the language writer,
not the system writer, may be responsible for supplying header
files. Consider, for example, PL/1 or COBOL for UNIX. I see
no COBOL header files here.
What non-kernel header files for C are supplied with Linux?
--
==================
Remove the "x" from my email address
Jerry Stuckle
***@attglobal.net
==================
Richard Bos
2017-03-25 21:45:44 UTC
Permalink
Post by jacobnavia
Post by BartC
Well, the contents of the struct don't matter, just the size. So they
could be replaced by char[48] in this case. Except that, if the same
declaration is used for a 32-bit target, the size will then be wrong.
No. We never use the structure directly we use a FILE *. The size of the
structure does NOT matter.
Extreme nitpick: yes, it does, but only in silly cases. You are allowed
to do this:

FILE t_stdin;

memcpy(t_stdin, stdin, sizeof(FILE));
memset(stdin, 0, sizeof(FILE));
memcpy(stdin, t_stdin, sizeof(FILE));

and all input from stdin must then function as if you'd never blanked
it.
You are not allowed to use t_stdin _as_ a FILE object (because,
uniquely, the address of a FILE is allowed to be as meaningful as its
contents); you are not allowed, unless I am very much mistaken, to use
file funtions on stdin in between both copies; but you _are_ allowed to
treat it as an abstract object.
For this reason, you _must_ know its size. Not its contents, that is
off-limits, so the gist of your objection to Bart's usual wilful
confusion is perfectly valid; but FILE's size needs to be available to
the user-programmer.

Richard
jacobnavia
2017-03-25 22:19:57 UTC
Permalink
Post by Richard Bos
Extreme nitpick: yes, it does, but only in silly cases.
Yes, but then, it would be better to tell the people in the committee to
accept that FILE is an opaque structure and allow implementations that
do not support that kind of silly code. I see no practical application
of that code anyway.

That would not work with lcc-win, but you could do:

memcpy(tmp,stdin,sizeof(FILE *));
stdin = NULL;
memcpy(stdin,tmp,sizeof(FILE *));

You can also get silly with lcc-win...
James R. Kuyper
2017-03-25 23:10:28 UTC
Permalink
Post by Richard Bos
Post by jacobnavia
Post by BartC
Well, the contents of the struct don't matter, just the size. So they
could be replaced by char[48] in this case. Except that, if the same
declaration is used for a 32-bit target, the size will then be wrong.
No. We never use the structure directly we use a FILE *. The size of the
structure does NOT matter.
Extreme nitpick: yes, it does, but only in silly cases. You are allowed
FILE t_stdin;
memcpy(t_stdin, stdin, sizeof(FILE));
memset(stdin, 0, sizeof(FILE));
memcpy(stdin, t_stdin, sizeof(FILE));
and all input from stdin must then function as if you'd never blanked
it.
You are not allowed to use t_stdin _as_ a FILE object (because,
uniquely, the address of a FILE is allowed to be as meaningful as its
contents); you are not allowed, unless I am very much mistaken, to use
file funtions on stdin in between both copies; but you _are_ allowed to
treat it as an abstract object.
For this reason, you _must_ know its size. Not its contents, that is
off-limits, so the gist of your objection to Bart's usual wilful
confusion is perfectly valid; but FILE's size needs to be available to
the user-programmer.
FILE is required to be an object type (7.21.1p2), and that is unchanged
from previous versions of the standard. However, in C99 and earlier,
object types and incomplete types were distinct non-overlapping
categories (6.2.5p1). In C2011, this was changed: incomplete types are
now a subset of object types. Someone carefully reviewed every use of
the terms "object type", "complete", and "incomplete", making the
appropriate changes in each case. It was deliberately decided to NOT
change that part of the wording of 7.21.1p2, with the result that the
meaning of those words HAS changed: it's now permitted for a FILE object
to have an incomplete type, which means the above code now has undefined
behavior (6.7p7). This decision was made to allow for it to be an array
of unknown size, or a opaque struct type.

I was completely thrown by this change in the type categories the first
time someone pointed it out to me. I've gotten used to it, but I'm still
occasionally surprised when it becomes relevant.
jacobnavia
2017-03-25 23:54:06 UTC
Permalink
owever, in C99 and earlier, object types and incomplete types were
distinct non-overlapping categories (6.2.5p1). In C2011, this was
changed: incomplete types are now a subset of object types. Someone
carefully reviewed every use of the terms "object type", "complete", and
"incomplete", making the appropriate changes in each case. It was
deliberately decided to NOT change that part of the wording of 7.21.1p2,
with the result that the meaning of those words HAS changed: it's now
permitted for a FILE object to have an incomplete type, which means the
above code now has undefined behavior (6.7p7). This decision was made to
allow for it to be an array of unknown size, or a opaque struct type.
WOW!

I am 100% standards compliant then. I wasn't aware of this (good) decision!

It is good because obviously FILE is n object managed by the runtime and
should be used as a BLACK BOX.

Thanks for this information.
j***@verizon.net
2017-03-26 02:48:51 UTC
Permalink
Post by jacobnavia
owever, in C99 and earlier, object types and incomplete types were
distinct non-overlapping categories (6.2.5p1). In C2011, this was
changed: incomplete types are now a subset of object types. Someone
carefully reviewed every use of the terms "object type", "complete", and
"incomplete", making the appropriate changes in each case. It was
deliberately decided to NOT change that part of the wording of 7.21.1p2,
with the result that the meaning of those words HAS changed: it's now
permitted for a FILE object to have an incomplete type, which means the
above code now has undefined behavior (6.7p7). This decision was made to
allow for it to be an array of unknown size, or a opaque struct type.
WOW!
I am 100% standards compliant then. I wasn't aware of this (good) decision!
It is good because obviously FILE is n object managed by the runtime and
should be used as a BLACK BOX.
Thanks for this information.
You're welcome! I thought you'd like it.
m***@gmail.com
2017-03-26 12:12:12 UTC
Permalink
Post by j***@verizon.net
Post by jacobnavia
owever, in C99 and earlier, object types and incomplete types were
distinct non-overlapping categories (6.2.5p1). In C2011, this was
changed: incomplete types are now a subset of object types. Someone
carefully reviewed every use of the terms "object type", "complete", and
"incomplete", making the appropriate changes in each case. It was
deliberately decided to NOT change that part of the wording of 7.21.1p2,
with the result that the meaning of those words HAS changed: it's now
permitted for a FILE object to have an incomplete type, which means the
above code now has undefined behavior (6.7p7). This decision was made to
allow for it to be an array of unknown size, or a opaque struct type.
WOW!
I am 100% standards compliant then. I wasn't aware of this (good) decision!
It is good because obviously FILE is n object managed by the runtime and
should be used as a BLACK BOX.
Thanks for this information.
You're welcome! I thought you'd like it.
I agree that information hiding (FILE used as BLACK BOX) makes
sense. But I have a problem: The functions provided for FILE are not
sufficient for me. Let me explain:

I want to have a function that tells me for a FILE if I can read
(at least) one character without blocking.

Reading from a regular file will not block, but reading from a
character device (keyboard) or pipe (fifo) or socket can block.
Linux provides poll but poll uses a file descriptor. Windows has
PeekConsoleInput and PeekNamedPipe, which both use a handle.

In both cases there might be some characters in the buffer of the
FILE object. When there is something in the buffer reading the next
char from a FILE will not block. The system specific functions are
just needed, when the buffer is empty.

For that reason I introduced the macro read_buffer_empty. This is
the extension to the C file functions that I need.

Determining the macro read_buffer_empty for a C run-time is not so
hard as people might thing. When getc is defined as macro the
condition to determine, if the read buffer is empty is part of the
getc macro. You can just get it from there.

Basically the buffer handling of FILE objects has two possibilities:
Either there is a pointer comparison or a counter is checked. I have
a program that checks the properties of C compiler and C run-time
library. This program (chkccomp.c) tries to compile different
versions of my read_buffer_empty macro. Finally there is a test if
read_buffer_empty works as expected. The following variants are
checked:

#define read_buffer_empty(fp) ((fp)->_IO_read_ptr >= (fp)->_IO_read_end)
#define read_buffer_empty(fp) ((fp)->_cnt <= 0)
#define read_buffer_empty(fp) ((fp)->__cnt <= 0)
#define read_buffer_empty(fp) ((fp)->level <= 0)
#define read_buffer_empty(fp) ((fp)->_r <= 0)
#define read_buffer_empty(fp) ((fp)->ptr >= (fp)->getend)
#define read_buffer_empty(fp) (*((int *)&((char *)(fp))[%d])==0)
#define read_buffer_empty(fp) (*((int **)&((char *)(fp))[32]) >= *((int **)&((char *)(fp))[40]))

So one of my wishes for the next C standard is a macro (or function)
that provides the functionality of read_buffer_empty.


Regards,
Thomas Mertes
--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
s***@casperkitty.com
2017-03-26 21:00:01 UTC
Permalink
Post by m***@gmail.com
For that reason I introduced the macro read_buffer_empty. This is
the extension to the C file functions that I need.
Any such macro will require that FILE be a structure with certain known
characteristics beyond any mandated by the Standard. Any system using a
type without those characteristics will fail, whether it is a structure
or something else. I'm not sure how allowing "something else" makes
things any worse than they already were.

Personally, I think the Standard should have long ago added functions to
poll perform I/O with a requested timeout and an indicator whether they
should perform partial operations or whether they should behave in all-
or-nothing fashion, along with a compile-time means of indicating what
features might be supported on at least some streams, and a run-time means
of querying what features are available on a particular stream.

Not all features will be available on all implementations, but many
implementations will support at least some, and many programs may be able
to offer useful partial functionality on implementations which can't
support all the features. Having a standard way to request such things
would be better than requiring programmers to write different code to
interface with every OS.
Gordon Burditt
2017-03-27 07:03:05 UTC
Permalink
Post by jacobnavia
fopen, and should be disposed of with fclose. The function fread() uses
that pointer, but nowhere are any fields of that pointer specified. That
is a slightly different thing as
char[48];
All code that follows those rules is portable to any C compiler. And I
have never seen any code that doesn't follow those rules.
I believe I have. I think it was in early versions of Perl. It
probably isn't there in current versions. It had some kind of
shortcut that needed to peek at (and probably fiddle with) the
fields in a FILE if they matched one of a few common setups.

However, it was still "portable". The Configure script was prepared
to try and compile a test program using those fields, and if it
wouldn't compile or the test program didn't see the fields do what
was expected, it would turn off compilation of the "optimization".
I don't recall it ever being turned on when it didn't work. I think
the test program may have turned off the optimization on a few
systems where it could have worked.
Post by jacobnavia
New code can use new functionalities that could be added to fopen.
For instance opening URLs with it, and abstracting away the network layer.
I believe that has the effect of opening a potential security hole
in any program that accepts filenames as input without checking
that they are NOT URLs. If you want a function that opens remote
URLs, fine, but *DON'T* call it fopen().

(This was more than a theoretical issue with PHP, and the combination
of (a) being able to set variables in a HTTP request, and (b) feeding
a variable file name to PHP in PHP code, and (c) not shutting off
fopen() accepting remote URLs (it was a configurable option, and I
think the default was OFF) could result in a compromised system.
Granted, (a) and (b) are coding problems on the to-be-compromised
web site, and (c) was an administration problem. PHP has since
been changed (long, long ago) to make it less easy to confuse local
variables and those provided remotely, permitting execution of
whatever (malicious) PHP code was contained in the URL contents by
the target system running PHP.)
Post by jacobnavia
FILE *fopen( URL, "rn|");
This would start reading the URL into disk, determined by
NET_BUFSIZ in stdio.h. The optional vertical bar could mean non-locking.
A NULL would mean that the host or the network is down. An open
connection would have a non NULL result, that can be used by other
functions.
For networking I want much clearer error messages. For example, what if the
host wasn't in DNS, or the nameserver didn't reply, etc. NULL isn't very
descriptive here. Also not very descriptive is what happens if you get
part of the file and THEN the network goes down. That rarely happens
with local hard disks.
Post by jacobnavia
fread would have the same parameters, as fclose. An easy way to abstract
the network into C.
An easy way to abstract real viruses into your system.
Robert Wessel
2017-03-25 03:39:48 UTC
Permalink
On Fri, 24 Mar 2017 22:40:59 +0100, jacobnavia
Post by jacobnavia
Post by BartC
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Clearly I have a lot to learn about making it hard to unravel!
typedef struct __FILE FILE;
And "struct __FILE" is defined NOWHERE.
Isn't that great?
You just can't access any of the fields of that structure, it is an
opaque structure. This
1) Keeps current practice. I have never seen anyone using any of those
fields directly.
2) It is concise and not at all verbose. Just ONE LINE!
That makes it rather more difficult to support fast macro versions of
things like getc and putc. OTOH, if you compiler supports those as
known builtin functions, or supports LTCG for library functions, that
would do mostly the same thing.

We can debate whether or not one should use getc/putc for I/O, but a
design consideration for C has always been that those are very fast,
and so are actually a reasonable alternative to the functions that
read or write larger chunks.

But yes, no one *should* be using those internal fields directly,
*except* for other parts of the standard library.

And there's also no reason you can't always implement getc as a
function, but that will surprise people expecting it to be fast.
jacobnavia
2017-03-25 08:13:36 UTC
Permalink
Post by Robert Wessel
That makes it rather more difficult to support fast macro versions of
things like getc and putc.
No, because getc and putc and be inlined, what is the equivalent of the
macros. It wouldn't be too difficult to do, and inlined in assembly,
much easier than those macros that expand to C code...
BartC
2017-03-25 11:24:45 UTC
Permalink
Post by Robert Wessel
On Fri, 24 Mar 2017 22:40:59 +0100, jacobnavia
Post by jacobnavia
typedef struct __FILE FILE;
That makes it rather more difficult to support fast macro versions of
things like getc and putc.
With the following code, gcc always generated a call to getc(), even
with -O3:

int c=0,sum=0;

while (c!='z'){
sum+= (c=getc(stdin));
}
--
bartc
Ian Collins
2017-03-25 19:42:03 UTC
Permalink
Post by BartC
Post by Robert Wessel
On Fri, 24 Mar 2017 22:40:59 +0100, jacobnavia
Post by jacobnavia
typedef struct __FILE FILE;
That makes it rather more difficult to support fast macro versions of
things like getc and putc.
With the following code, gcc always generated a call to getc(), even
int c=0,sum=0;
while (c!='z'){
sum+= (c=getc(stdin));
}
Does it?

cat x.c; gcc x.c -O3 -S; cat x.s
#include <stdio.h>

void f()
{
int c=0,sum=0;

while (c!='z'){
sum+= (c=getc(stdin));
}
}
.file "x.c"
.text
.p2align 4,,15
.globl f
.type f, @function
f:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
.L8:
movl __iob, %eax
.L2:
subl $1, %eax
testl %eax, %eax
movl %eax, __iob
js .L10
movl __iob+4, %edx
leal 1(%edx), %ecx
movl %ecx, __iob+4
cmpb $122, (%edx)
jne .L2
leave
ret
.p2align 4,,10
.p2align 3
.L10:
subl $12, %esp
pushl $__iob
call __filbuf
addl $16, %esp
cmpl $122, %eax
jne .L8
leave
ret
.size f, .-f
.ident "GCC: (GNU) 6.3.0"
--
Ian
BartC
2017-03-25 19:51:56 UTC
Permalink
Post by Ian Collins
Post by BartC
With the following code, gcc always generated a call to getc(), even
int c=0,sum=0;
while (c!='z'){
sum+= (c=getc(stdin));
}
Does it?
movl __iob, %eax
subl $1, %eax
testl %eax, %eax
movl %eax, __iob
js .L10
movl __iob+4, %edx
leal 1(%edx), %ecx
movl %ecx, __iob+4
cmpb $122, (%edx)
jne .L2
Yes:

# GNU C11 (tdm64-1) version 5.1.0 (x86_64-w64-mingw32)
# c.c -m64 -masm=intel -mtune=generic -march=x86-64 -O3 -fverbose-asm
....
.L2:
| call rbx # tmp92
| mov rcx, rax #, D.2589
| call getc #
| cmp eax, 122 # c,
| jne .L2 #,
....
Post by Ian Collins
.ident "GCC: (GNU) 6.3.0"
Maybe that makes a difference. But for how many decades has getc been a
macro just for this purpose? Has gcc only just got round to it?
--
bartc
Ian Collins
2017-03-25 20:10:01 UTC
Permalink
Post by BartC
Post by Ian Collins
Post by BartC
With the following code, gcc always generated a call to getc(), even
int c=0,sum=0;
while (c!='z'){
sum+= (c=getc(stdin));
}
Does it?
movl __iob, %eax
subl $1, %eax
testl %eax, %eax
movl %eax, __iob
js .L10
movl __iob+4, %edx
leal 1(%edx), %ecx
movl %ecx, __iob+4
cmpb $122, (%edx)
jne .L2
# GNU C11 (tdm64-1) version 5.1.0 (x86_64-w64-mingw32)
# c.c -m64 -masm=intel -mtune=generic -march=x86-64 -O3 -fverbose-asm
....
| call rbx # tmp92
| mov rcx, rax #, D.2589
| call getc #
| cmp eax, 122 # c,
| jne .L2 #,
....
Post by Ian Collins
.ident "GCC: (GNU) 6.3.0"
Maybe that makes a difference. But for how many decades has getc been a
macro just for this purpose? Has gcc only just got round to it?
No, the difference is -m64, which is odd...
--
Ian
BartC
2017-03-25 20:24:19 UTC
Permalink
Post by Ian Collins
Post by BartC
Post by Ian Collins
.ident "GCC: (GNU) 6.3.0"
Maybe that makes a difference. But for how many decades has getc been a
macro just for this purpose? Has gcc only just got round to it?
No, the difference is -m64, which is odd...
I tested with -m32, and tried -m64 to see if it made a difference. It
didn't.
--
bartc
Keith Thompson
2017-03-25 20:43:05 UTC
Permalink
BartC <***@freeuk.com> writes:
[...]
Post by BartC
Maybe that makes a difference. But for how many decades has getc been a
macro just for this purpose? Has gcc only just got round to it?
gcc doesn't provide getc. If you're on Windows, you're probably using a
different library implementation than is commonly used on Linux.

The compiler sees a call to getc(), and it generates code for it in
accordance with the declaration (as a function, as an inline function,
or as a macro) it sees in <stdio.h>.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Ian Collins
2017-03-25 21:30:13 UTC
Permalink
Post by Keith Thompson
[...]
Post by BartC
Maybe that makes a difference. But for how many decades has getc been a
macro just for this purpose? Has gcc only just got round to it?
gcc doesn't provide getc. If you're on Windows, you're probably using a
different library implementation than is commonly used on Linux.
The compiler sees a call to getc(), and it generates code for it in
accordance with the declaration (as a function, as an inline function,
or as a macro) it sees in <stdio.h>.
That was the reply I originally typed. The odd thing was that on my
Solaris machine, gcc (unlike Sun cc) calls getc() in 64 bit mode but
expands the macro in 32 bit mode.
--
Ian
Jean-Marc Bourguet
2017-03-26 08:23:30 UTC
Permalink
Post by Ian Collins
That was the reply I originally typed. The odd thing was that on my
Solaris machine, gcc (unlike Sun cc) calls getc() in 64 bit mode but
expands the macro in 32 bit mode.
A differente default for threading support? POSIX mandates that getc is
thread save and has getc_unlocked for the case where you can handle such
issue yourself.

Yours,
--
Jean-Marc
s***@casperkitty.com
2017-03-24 22:08:31 UTC
Permalink
Post by BartC
But the main requirement is that this struct takes up 48 bytes in 64-bit
Windows (because stdin/out/err are defined as entries in an array of
such structs). So I know now that this won't work under Linux as it is.
I'll have to see how to make that possible without /my/ headers
suffering the same fate of drowning in a sea of macros and conditionals.
It is often convenient to have FILE be a struct whose layout coincides with
some sort of file-based structure used in the underlying system, but all that
really matters is that all code which uses FILE* as anything other than an
opaque pointer use it the same way. In fact, FILE doesn't even need to be
a defined struct, and file pointers don't have to point at *anything*. If
some underlying system assigns all files a handle from 0 to 255, and code
knows of any static object which is at least 256 bytes, then fopen() could do
something like

int fileNumber = openFile(...); // A method that returns 1-255 on success
if (fileNumber > 0)
return (FILE*)((char*)dummyObject)+fileNumber;

and fwrite() could do something like:

int fileNumber = (char*)theFile - (char*)dummyObject;
writeFile(fileNumber, ...);

Such code would be fully portable to any system with a suitable set of I/O
functions and the 256-byte object. The behavior of all the pointer arithmetic
would be fully defined by the C Standard.
BartC
2017-03-24 22:43:54 UTC
Permalink
Post by s***@casperkitty.com
Post by BartC
But the main requirement is that this struct takes up 48 bytes in 64-bit
Windows (because stdin/out/err are defined as entries in an array of
such structs). So I know now that this won't work under Linux as it is.
I'll have to see how to make that possible without /my/ headers
suffering the same fate of drowning in a sea of macros and conditionals.
It is often convenient to have FILE be a struct whose layout coincides with
some sort of file-based structure used in the underlying system, but all that
really matters is that all code which uses FILE* as anything other than an
opaque pointer use it the same way.
When I use FILE* types from outside of the C language, I just use the
equivalent of void*.

But I haven't yet managed to get hold of stdin/stdout which are
surprisingly difficult when no C headers are involved.

However I now know I can use _iob/_iob_func under Windows; I'll have to
find the equivalent for Linux.
--
bart
s***@casperkitty.com
2017-03-24 22:55:59 UTC
Permalink
Post by BartC
But I haven't yet managed to get hold of stdin/stdout which are
surprisingly difficult when no C headers are involved.
What does the Standard require of those? I would think that on some platforms
it would be most useful to say

FILE stdin[1];

and on some it would be more helpful to say:

FILE *stdin = (FILE*)&stdin_descriptor;

while others might use something like:

FILE *stdin = (FILE*)1; // Dummy address that won't identify a real object

Would the Standard forbid any of those (bearing in mind that the first would
require that headers declare stdin as FILE[], while the others would need it
to be declared as FILE*)? I could certainly imagine situations where each
could be superior.
Keith Thompson
2017-03-24 23:27:04 UTC
Permalink
Post by s***@casperkitty.com
Post by BartC
But I haven't yet managed to get hold of stdin/stdout which are
surprisingly difficult when no C headers are involved.
What does the Standard require of those? I would think that on some platforms
it would be most useful to say
FILE stdin[1];
FILE *stdin = (FILE*)&stdin_descriptor;
FILE *stdin = (FILE*)1; // Dummy address that won't identify a real object
Would the Standard forbid any of those (bearing in mind that the first would
require that headers declare stdin as FILE[], while the others would need it
to be declared as FILE*)? I could certainly imagine situations where each
could be superior.
stdin, stdout, and stderr are required to be macros, so you'd
additionally need something like:

#define stdin stdin
#define stdout stdout
#define stderr stderr

(GNU libc does this.)

Your first suggestion:

FILE stdin[1];

would be non-conforming, since stdin is required to expand to an
expression of type FILE*.

Making FILE an incomplete type is a mildly interesting idea, but some
stdio functions are commonly defined as macros that access the members
of the FILE object. They'd have to use pointer casts to some complete
type.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
jacobnavia
2017-03-25 00:17:15 UTC
Permalink
Post by Keith Thompson
Making FILE an incomplete type is a mildly interesting idea,
Implemented in lcc-win 64.
jacobnavia
2017-03-25 00:18:08 UTC
Permalink
Post by Keith Thompson
some
stdio functions are commonly defined as macros that access the members
of the FILE object
They are real functions when the pointer is opaque. They can be easily
inlined though, but I haven't done that yet.
Scott Lurndal
2017-03-27 15:06:36 UTC
Permalink
Post by BartC
Post by Keith Thompson
(I'm sure you mean the *declaration* of printf.) It doesn't vary a
whole lot, though some systems will probably have to make the "restrict"
optional somehow to allow for pre-C99 compilers.
But the definition of type FILE probably does vary a great deal.
I tried to have a look, but after several minutes I couldn't even find
what FILE was defined as! They seem intent on making this impenetrable.
I worked out that FILE was an alias for 'struct _IO_FILE', and that that
was probably defined in libio.h.
Oh, hang on, I think I've just found it. _IO_FILE occurs quite a lot in
typedef struct _IO_FILE _IO_FILE;
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Ah, so you don't care about thread-safety, it appears. Your
data structure layout also leaves holes on 64-bit systems.
BartC
2017-03-27 15:37:55 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Post by Keith Thompson
(I'm sure you mean the *declaration* of printf.) It doesn't vary a
whole lot, though some systems will probably have to make the "restrict"
optional somehow to allow for pre-C99 compilers.
But the definition of type FILE probably does vary a great deal.
I tried to have a look, but after several minutes I couldn't even find
what FILE was defined as! They seem intent on making this impenetrable.
I worked out that FILE was an alias for 'struct _IO_FILE', and that that
was probably defined in libio.h.
Oh, hang on, I think I've just found it. _IO_FILE occurs quite a lot in
typedef struct _IO_FILE _IO_FILE;
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Ah, so you don't care about thread-safety, it appears.
It was lifted from MS's own headers (the ones supplied with MSVC2008, a
32-bit system):

struct _iobuf {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
};
typedef struct _iobuf FILE;
Post by Scott Lurndal
Your
data structure layout also leaves holes on 64-bit systems.
C requires padding on such a structure for 64-bit pointers. I don't know
what struct is actually used, but the size is 48 bytes, and mine is 48
bytes, which is all that is really needed.

I don't know what significance the presence of padding holes has.

BTW here's the struct as defined in the Tiny C's stdio.h for 64-bit Windows:

struct _iobuf {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
};
typedef struct _iobuf FILE;

Look familiar? And actually, it's identical to what's used in
gcc/mingw's stdio.h. I hadn't bothered looking before because I assumed
it would be something horrendous.

Anyway, if you're saying my header has something wrong with it
(obviously, trying to belittle my efforts), then you need to complain to
the gcc/mingw people too.
--
bartc
Scott Lurndal
2017-03-27 16:25:21 UTC
Permalink
Post by BartC
Post by Scott Lurndal
Post by BartC
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Ah, so you don't care about thread-safety, it appears.
It was lifted from MS's own headers (the ones supplied with MSVC2008, a
So what? MS is hardly an exemplar of intelligent operating system
design.
Post by BartC
Post by Scott Lurndal
Your
data structure layout also leaves holes on 64-bit systems.
C requires padding on such a structure for 64-bit pointers. I don't know
what struct is actually used, but the size is 48 bytes, and mine is 48
bytes, which is all that is really needed.
You could easily reduce that to 44 bytes by simply rearranging
the order of the elements. Anything to make more effective use
of the limited L1/L2 cache heirarchies is useful, particuarly in
library code. Four bytes may not seem much, but for an array of
several hundred FILE instances, it adds up quickly.
Post by BartC
Anyway, if you're saying my header has something wrong with it
(obviously, trying to belittle my efforts), then you need to complain to
the gcc/mingw people too.
I simply pointed out:

1) You don't take into account that many of the functions that
access FILE are required to be thread-safe (e.g. getc),
so one would expect some form of mutual exclusion mechanism
to be included in struct FILE.

2) One should, in library code, be very cognizent of the data
structure layout for performance reasons.

Here's one on linux:

struct _IO_FILE {
int _flags; /* 0 4 */

/* XXX 4 bytes hole, try to pack */

char * _IO_read_ptr; /* 8 8 */
char * _IO_read_end; /* 16 8 */
char * _IO_read_base; /* 24 8 */
char * _IO_write_base; /* 32 8 */
char * _IO_write_ptr; /* 40 8 */
char * _IO_write_end; /* 48 8 */
char * _IO_buf_base; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
char * _IO_buf_end; /* 64 8 */
char * _IO_save_base; /* 72 8 */
char * _IO_backup_base; /* 80 8 */
char * _IO_save_end; /* 88 8 */
class _IO_marker * _markers; /* 96 8 */
class _IO_FILE * _chain; /* 104 8 */
int _fileno; /* 112 4 */
int _flags2; /* 116 4 */
__off_t _old_offset; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
short unsigned int _cur_column; /* 128 2 */
signed char _vtable_offset; /* 130 1 */
char _shortbuf[1]; /* 131 1 */

/* XXX 4 bytes hole, try to pack */

_IO_lock_t * _lock; /* 136 8 */
__off64_t _offset; /* 144 8 */
void * __pad1; /* 152 8 */
void * __pad2; /* 160 8 */
void * __pad3; /* 168 8 */
void * __pad4; /* 176 8 */
size_t __pad5; /* 184 8 */
/* --- cacheline 3 boundary (192 bytes) --- */
int _mode; /* 192 4 */
char _unused2[20]; /* 196 20 */

/* size: 216, cachelines: 4, members: 29 */
/* sum members: 208, holes: 2, sum holes: 8 */
/* last cacheline: 24 bytes */
};

Yes, they have holes - I suspect they're there for backward
compatability.
Scott Lurndal
2017-03-27 17:57:16 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Post by BartC
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Anyway, if you're saying my header has something wrong with it
(obviously, trying to belittle my efforts), then you need to complain to
the gcc/mingw people too.
struct _IO_FILE {
[omitted]
Post by Scott Lurndal
char _unused2[20]; /* 196 20 */
/* size: 216, cachelines: 4, members: 29 */
/* sum members: 208, holes: 2, sum holes: 8 */
/* last cacheline: 24 bytes */
};
And the one thing you should take home from this example is that
they've futureproofed the structure by leaving unused bytes
at the end to support adding new fields in the future without
breaking binary compatability with existing applications.
BartC
2017-03-27 18:12:44 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Post by Scott Lurndal
Post by BartC
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Ah, so you don't care about thread-safety, it appears.
It was lifted from MS's own headers (the ones supplied with MSVC2008, a
So what? MS is hardly an exemplar of intelligent operating system
design.
I'm not sure what you're on about. Are you having a go at me or at
Microsoft, or both?

In any case, you don't seem to understand that I can't just make up my
own FILE structure, it has to be compatible with what is exported by
MSVCRT.DLL. If I was making up my own stuff, I wouldn't stop with just
one small structure, I would change everything, including the language
(this is what I used to do actually).
Post by Scott Lurndal
You could easily reduce that to 44 bytes by simply rearranging
the order of the elements. Anything to make more effective use
of the limited L1/L2 cache heirarchies is useful, particuarly in
library code. Four bytes may not seem much, but for an array of
several hundred FILE instances, it adds up quickly.
Then it won't work because it has to be 48 bytes to work properly with
the FILE _iob[] array exported by MSVCRT.DLL. The first three elements
have been set up to work as C's stdin, stdout and stdout descriptors.
The stride used is 48 bytes.

(For just that purpose, I can hard-code the 48 bytes via char pointer
casts, but then it won't work in 32 bits. This way it should work on both.)
Post by Scott Lurndal
2) One should, in library code, be very cognizent of the data
structure layout for performance reasons.
In every other case, I do exactly that. In spades. My structs are
usually carefully crafted to be a power-of-two bytes in size, if there
are going to be lots of them. Here's one which looks bigger but is
actually only 16 bytes:


struct _varrec {
union {
struct {
uint16 tag;
byte hasref;
union {
byte stackadj;
byte opdims;
byte ittype;
};
};
uint32 tagx;
};
union {
struct {
uint16 refelemtag;
byte refbitoffset;
byte spare2;
};
struct {
int16 frameoffset;
byte exceptiontype;
byte nexceptions;
};
int32 frameptr_low;
int32 itcount;
int32 bndigits;
};
union {
objrec * objptr;
struct _varrec* varptr;
byte * packptr;
byte * ptr;
int64 * dptr;
int64 value;
uint64 uvalue;
double xvalue;
int32 * retaddr;
struct {
int32 range_lower;
int32 range_upper;
};
};
};
--
bartc
Robert Wessel
2017-03-27 19:19:20 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Post by Scott Lurndal
Post by BartC
typedef struct {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
} FILE;
Ah, so you don't care about thread-safety, it appears.
It was lifted from MS's own headers (the ones supplied with MSVC2008, a
So what? MS is hardly an exemplar of intelligent operating system
design.
Given that few programs likely create large numbers of FILE
structures, a bit of excess padding is not much of an issue. Plus
that way MS didn't have to do a different version for Win64 (which
would have given BartC another #ifdef to complain about). Besides it
wouldn't have actually made any difference anyway (see below).
Post by Scott Lurndal
Post by BartC
Post by Scott Lurndal
Your
data structure layout also leaves holes on 64-bit systems.
C requires padding on such a structure for 64-bit pointers. I don't know
what struct is actually used, but the size is 48 bytes, and mine is 48
bytes, which is all that is really needed.
You could easily reduce that to 44 bytes by simply rearranging
the order of the elements. Anything to make more effective use
of the limited L1/L2 cache heirarchies is useful, particuarly in
library code. Four bytes may not seem much, but for an array of
several hundred FILE instances, it adds up quickly.
You probably actually couldn't. With 64-bit pointers, the minimum
alignment for the structure would (usually) be 8 bytes, and you'd
ended up with four bytes of padding at the end of each instance for an
array. Nor would those extra four bytes usually be enough to allow a
smaller physical allocation if each instance were malloc'd.
David Brown
2017-03-24 18:25:09 UTC
Permalink
Post by BartC
Post by David Brown
Post by BartC
__builtin_expect is something to do with you telling the compiler
whether you think some particular expression is more likely to be true
than false, or something along those lines.
<https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect>
__builtin_expect(x, y) has the value "x", but tells the compiler that
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
if (likely(n > 3)) ...
and know that the compiler will use branch prediction instructions or
optimisation on the assumption that "n > 3" should be the fastest path.
Suppose you get it wrong?
Then your code is a little slower than it could be.
Post by BartC
Anyway the real problem is writing things like __builtin_expect(....)
all over the place (and dozens of other examples), and then trying to
compile it on something that doesn't understand __builtin_expect. I
don't want to have implement both C and half of gcc, even if the latter
justinvolves having to look all this stuff up and devising a workaround.
It is a pretty minor part of writing a C compiler, I would think.
Post by BartC
(I don't know the provenance of this particular set of sources files,
and whether they were specifically configured for gcc, or whether they
could be configured for anything else. But anything that needs to be
configured for specific compilers would ring alarm bells if it will not
also work with a generic C compiler.)
Implementation-specific files are allowed to use implementation-specific
extensions.

And an OS or library is allowed to place requirements on the compilers
used for it. There are several compilers typically used on Linux
systems (gcc, clang, icc, plus a few others). They manage all right.
Post by BartC
Post by David Brown
#define __builtin_expect(x, y) (x)
It's not standard C, so I'm not interested. I just wanted to compile
some code not get involved in gcc yet again.
The above define is perfectly good standard C (with an implementation
reserved identifier).
Post by BartC
Post by David Brown
That is because Linux - like most operating systems except Windows - has
system headers to let people write programs for their system. And these
system headers are /not/ the same as a compiler's standard headers, and
the compiler's standard headers are not the same as the C library's
headers. In the Windows world, these are usually thrown together in one
lump because the OS does not have standard headers, and because
compilers for Windows usually don't have a choice of libraries.
I'm not talking about OS system headers, but the standard headers from
the C standard.
The C standard describes some of the features of the standard headers,
it does not give them as complete files. (They don't even have to /be/
files.)
Post by BartC
For Windows, I plan to use the C library provided in msvcrt.dll. For
Linux, I think the equivalent is libc.so.6.
Yes.

But people typically want more than just the standard C library - they
usually also want access to OS-specific functions.
Post by BartC
The standard headers need to provide function signatures for functions
exported from those libraries. Plus define those other entities
mentioned in the C standard (NULL for example).
There doesn't seem to me to be much to it.
Now, to go beyond the C standard library, in Windows a big give-away is
the presence of windows.h, which I don't intend to duplicate at the
minute (I can borrow one from an existing compiler, but not gcc because
it's full of advanced gcc-specific features and is also up to 20 times
the size of the smallest).
In Linux, it's a bit tricker - perhaps headers such as dirent.h. The
dividing line between C functions and OS functions isn't as clear, and
people will probably use headers from either without even thinking about it.
This is why I'm only bothering with C sources either known to compile on
Windows with /any/ compiler, or that are designed to, and do, compile on
both OSes.
Supporting dirent.h etc can come later, however debugging what goes
wrong is hard if the only other compiler I can compare with happens to
be gcc.
Post by David Brown
Post by BartC
The same program using Linux headers will fetch 80 includes.
So what?
So 800 would be no problem? What about 8000? We are still talking about
STANDARD C. The library isn't that big, but you're not curious as to why
Linux and/or gcc make such a meal of it?
I have a reasonable idea of why there are many files included by Linux
and/or gcc. And I don't see a problem with it.
s***@casperkitty.com
2017-03-24 15:10:21 UTC
Permalink
Post by BartC
__builtin_expect is something to do with you telling the compiler
whether you think some particular expression is more likely to be true
than false, or something along those lines.
You don't want stuff like that cluttering up your code. If you're going
to those lengths, then you might as well write assembly code.
In order for an entity (human or automated) to generate efficient code, it
needs to know a variety of things. There are some things programmers will
know more about than a compiler (e.g. what kinds of inputs is a program more
likely to receive, or which parts of a program will cause more annoyance if
they run slowly) and there are some things a compiler is likely to know more
about than a programmer (e.g. how will the costs of different alternative
versions of a piece of code compare in the "true" and "false" cases).

The purpose of things like "expect" directives generally isn't to compel
compilers to generate code a precise way, but rather to give the compiler
information which it may use or not as it sees fit; a good compiler should
be able to combine that information with the things it knows to make good
decisions about code generation. If the programmer doesn't tell the compiler
which branches will be more likely *with the kinds of input the program is
expected to receive* how is a compiler supposed to know?
Scott Lurndal
2017-03-24 13:01:51 UTC
Permalink
Post by BartC
Post by jacobnavia
Hi
I am starting to get results with my ARM64 port. lcc-win is compiling
big programs now, without any problems...
The acid test is to compile the compiler with itself. To do that, I need
to parse the includes furnished by the linux system/gcc. There is a
confusing mixture of include files in
/usr/include,
/usr/include/aarch64/bits,
/usr/local/include/aarch64,
and a LONG list of "places" where includes with the same name are stored.
Which one should I use? How could I figure out which ones are used by gcc?
You've missed a few locations (gcc has its own directories in addition
to the directories provided by the operating system).

gcc -H

Note that all the headers are in /usr/include, but they may include
implementation-dependent headers from other directories.
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full of
'__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
#define __builtin_expect(x,y) (1)
David Brown
2017-03-24 14:50:41 UTC
Permalink
Post by Scott Lurndal
Post by BartC
Post by jacobnavia
Hi
I am starting to get results with my ARM64 port. lcc-win is compiling
big programs now, without any problems...
The acid test is to compile the compiler with itself. To do that, I need
to parse the includes furnished by the linux system/gcc. There is a
confusing mixture of include files in
/usr/include,
/usr/include/aarch64/bits,
/usr/local/include/aarch64,
and a LONG list of "places" where includes with the same name are stored.
Which one should I use? How could I figure out which ones are used by gcc?
You've missed a few locations (gcc has its own directories in addition
to the directories provided by the operating system).
gcc -H
Note that all the headers are in /usr/include, but they may include
implementation-dependent headers from other directories.
Post by BartC
As I said elsewhere, it's hard to prise gcc and Linux apart. (Just
yesterday I had to try and compile code - on Windows this time - full of
'__builtin_expect(x,y)'. __builtin_expect is a gcc thing.)
#define __builtin_expect(x,y) (1)
You mean:

#define __builtin_expect(x,y) (x)
David Brown
2017-03-24 12:07:37 UTC
Permalink
Post by jacobnavia
Hi
I am starting to get results with my ARM64 port. lcc-win is compiling
big programs now, without any problems...
The acid test is to compile the compiler with itself. To do that, I need
to parse the includes furnished by the linux system/gcc. There is a
confusing mixture of include files in
/usr/include,
/usr/include/aarch64/bits,
/usr/local/include/aarch64,
and a LONG list of "places" where includes with the same name are stored.
Which one should I use? How could I figure out which ones are used by gcc?
To avoid these problems, I decided to move the includes I am using from
windows to linux, but that would mean that I port my standard C library
into linux, not an easy task, specially because the source code is 100%
adapted to lcc and not easily used by other compilers, and that could
introduce a new set of bugs!
Besides, this produces binary incompatibilities between the generated
code and what the functions in the library expect, even more trouble...
So, I have no other choice than to parse include files full of "gccisms"
and to make that work, and that starts with figuring out WHICH include
files should I use, hence my question in the subject line.
The standard is silent on this subject, not even specifying a common
command that all compilers could support, telling the user how the
search paths are built.
gcc has a number of command line options which can be useful here. It
can print out the list of included headers while processing a file, and
also a list of options (built-in defaults, environment variables,
command-line options, etc.). Googling for "gcc print out include path"
gives this:

<http://stackoverflow.com/questions/17939930/finding-out-what-the-gcc-include-path-is>
jacobnavia
2017-03-24 14:48:59 UTC
Permalink
Post by David Brown
gcc has a number of command line options which can be useful here. It
can print out the list of included headers while processing a file, and
also a list of options (built-in defaults, environment variables,
command-line options, etc.). Googling for "gcc print out include path"
<http://stackoverflow.com/questions/17939930/finding-out-what-the-gcc-include-path-is>
Thank you. The first person to answer the question asked!

echo | gcc -E -Wp,-v -

Interesting. This gives me the following algorithm for the lcc compiler:

1) Look for ~/.lcc/includes
2) If not found, do
system("echo | gcc -E -Wp, -v -");

and redirect stderr into a file. Then parse that file and write it into
~/.lcc/includes.
3) Read ~/.lcc/includes and set the path for the compiler.

Looks sensible to you?
David Brown
2017-03-24 15:26:32 UTC
Permalink
Post by jacobnavia
Post by David Brown
gcc has a number of command line options which can be useful here. It
can print out the list of included headers while processing a file, and
also a list of options (built-in defaults, environment variables,
command-line options, etc.). Googling for "gcc print out include path"
<http://stackoverflow.com/questions/17939930/finding-out-what-the-gcc-include-path-is>
Thank you. The first person to answer the question asked!
Sometimes I do try to be helpful :-)

However, note that my answer was the first result of an obvious google
search that would have taken you less time than writing the newsgroup
post. (But then we would miss out on the entertainment.)
Post by jacobnavia
echo | gcc -E -Wp,-v -
1) Look for ~/.lcc/includes
2) If not found, do
system("echo | gcc -E -Wp, -v -");
and redirect stderr into a file. Then parse that file and write it into
~/.lcc/includes.
3) Read ~/.lcc/includes and set the path for the compiler.
Looks sensible to you?
Running "echo | gcc -E -Wp,-v -" on my old Linux system I get:

ignoring nonexistent directory
"/usr/lib/gcc/x86_64-redhat-linux/4.5.1/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc/x86_64-redhat-linux/4.5.1/include
/usr/include


Running it on a newer system I get:

ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/5/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/5/include
/usr/local/include
/usr/lib/gcc/x86_64-linux-gnu/5/include-fixed
/usr/include/x86_64-linux-gnu
/usr/include


Running it on a local installation of a newer gcc:

echo | /usr/local/gcc-7-20170226/bin/gcc -E -Wp,-v -
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory
"/usr/local/gcc-7-20170226/lib/gcc/x86_64-pc-linux-gnu/7.0.1/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/gcc-7-20170226/lib/gcc/x86_64-pc-linux-gnu/7.0.1/include
/usr/local/include
/usr/local/gcc-7-20170226/include
/usr/local/gcc-7-20170226/lib/gcc/x86_64-pc-linux-gnu/7.0.1/include-fixed
/usr/include/x86_64-linux-gnu
/usr/include


So the order seems to be:

1. The compiler's own include directory
2. The machine's local include files (for development libraries
installed independently from the distro's package management)
3. The "include-fixed" headers, which gcc creates during installation to
fix and override a few system headers.
4. The OS include directory
5. The system include directory (where most development library headers
are kept, including normal C library headers).


I would recommend trying to replicate that directly, rather than writing
it into a user configuration file. Users might reasonably expect to be
able to move their lcc configuration file to a different computer and
still have things work correctly.

Additionally, people may have more than one installation of lcc. Now
that you support more than one OS and more than one target, it is
natural to think about cross-compilation - and the header files for
building Windows programs on Linux, or ARM code on x86 hosts, are going
to be in different places.
Gareth Owen
2017-03-24 16:32:51 UTC
Permalink
Post by David Brown
Post by jacobnavia
Thank you. The first person to answer the question asked!
Sometimes I do try to be helpful :-)
Well cut that out.

It is c.l.c law that you are first required to state that

"comp.lang.c. is just for discussing standard C, and mostly for
discussing whether I should cast the return value of malloc()"

and then add

"questions about specific compilers should be directed to their specific
newsgroups, even if those newsgroups have been dead for years".
James Kuyper
2017-03-24 16:53:26 UTC
Permalink
Post by Gareth Owen
Post by David Brown
Post by jacobnavia
Thank you. The first person to answer the question asked!
Sometimes I do try to be helpful :-)
Well cut that out.
It is c.l.c law that you are first required to state that
"comp.lang.c. is just for discussing standard C, and mostly for
discussing whether I should cast the return value of malloc()"
and then add
"questions about specific compilers should be directed to their specific
newsgroups, even if those newsgroups have been dead for years".
That should be "an appropriate forum" rather than "their specific
newsgroup" - many modern compilers are discussed in places other than
newsgroups.

Note also - if the most appropriate forum for discussing a specific
compiler is dead, then the compiler is probably dead, too. If interest
in a compiler has waned so much that you can't reasonably expect to find
someone to answer your questions in the appropriate forum, you're even
less likely to find someone to answer your question when posting to an
inappropriate forum.
jacobnavia
2017-03-24 17:24:45 UTC
Permalink
Post by James Kuyper
Note also - if the most appropriate forum for discussing a specific
compiler is dead, then the compiler is probably dead, too.
The last message posted to gnu.g++.help was on oct 19th 2016. It was
someone asking a question. No answer ensued.


Is g++ dead then?
Scott Lurndal
2017-03-24 17:29:55 UTC
Permalink
Post by jacobnavia
Post by James Kuyper
Note also - if the most appropriate forum for discussing a specific
compiler is dead, then the compiler is probably dead, too.
The last message posted to gnu.g++.help was on oct 19th 2016. It was
someone asking a question. No answer ensued.
Is g++ dead then?
you missed "the most appropriate forum", which for g++ is a mailing list,
not the usenet group.
j***@verizon.net
2017-03-24 17:55:10 UTC
Permalink
Post by jacobnavia
Post by James Kuyper
Note also - if the most appropriate forum for discussing a specific
compiler is dead, then the compiler is probably dead, too.
The last message posted to gnu.g++.help was on oct 19th 2016. It was
someone asking a question. No answer ensued.
Is g++ dead then?
I specified the "most appropriate" forum. How did you reach the conclusion that gnu.g++.help was more appropriate than the mailing list and IRC channels mentioned at <http://www.gnu.org/software/gethelp.html>?
s***@casperkitty.com
2017-03-24 17:47:36 UTC
Permalink
Post by David Brown
1. The compiler's own include directory
2. The machine's local include files (for development libraries
installed independently from the distro's package management)
3. The "include-fixed" headers, which gcc creates during installation to
fix and override a few system headers.
4. The OS include directory
5. The system include directory (where most development library headers
are kept, including normal C library headers).
I find rather sloppy the practice of relying upon include file search paths
to find header files. Such practice can lead to some nasty problems if
two different libraries have almost identical internal-use header files with
the same name. Even if either header would work if used everywhere, code
which is in the same directory as one of the headers will use it while code
which isn't in that same directory will use the other.

I wonder if any compilers would be made less useful by a rule which required
that compilers recognize

#include "this" "that"

as a valid form of include equivalent to

#include "thisthat"

in a fashion analogous to the handling of strings in other contexts? That
would make it practical to say:

#include LIBDIRS
#include LIB1DIR "fileFromLib1.h"
#include LIB2DIR "fileFromLib2.h"

etc. and have "LIBDIRS" be set at the command line to point to a project-wide
file containing the appropriate paths for all the libraries used therein.
It would almost be practical to use

#include LIBDIRS
#include makeHeaderPath(LIB1DIR,fileFromLib1)
#include makeHeaderPath(LIB2DIR,fileFromLib2)

but there would be no way to protect filenames from macro substitution.
Keith Thompson
2017-03-24 16:37:52 UTC
Permalink
Post by jacobnavia
I am starting to get results with my ARM64 port. lcc-win is compiling
big programs now, without any problems...
The acid test is to compile the compiler with itself. To do that, I need
to parse the includes furnished by the linux system/gcc. There is a
confusing mixture of include files in
/usr/include,
/usr/include/aarch64/bits,
/usr/local/include/aarch64,
and a LONG list of "places" where includes with the same name are stored.
Which one should I use? How could I figure out which ones are used by gcc?
By asking gcc. Compile some random source file with "gcc -v". On my
system, I get:

ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/6/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/6/include
/usr/local/include
/usr/lib/gcc/x86_64-linux-gnu/6/include-fixed
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Les Cargill
2017-03-25 20:07:15 UTC
Permalink
Post by jacobnavia
Hi
I am starting to get results with my ARM64 port. lcc-win is
compiling big programs now, without any problems...
The acid test is to compile the compiler with itself. To do that, I
need to parse the includes furnished by the linux system/gcc. There
is a confusing mixture of include files in /usr/include,
/usr/include/aarch64/bits, /usr/local/include/aarch64,
and a LONG list of "places" where includes with the same name are stored.
Which one should I use? How could I figure out which ones are used by gcc?
You use the one that best fits your architecture. A way to do this
is to run the preprocessor on a trivial program and see which
directories get included.

In effect there are a bunch of hidden #define/--option like things
built into the gcc chain that tell it which directories to use.

https://gcc.gnu.org/install/configure.html
Post by jacobnavia
To avoid these problems, I decided to move the includes I am using
from windows to linux, but that would mean that I port my standard C
library into linux, not an easy task, specially because the source
code is 100% adapted to lcc and not easily used by other compilers,
and that could introduce a new set of bugs!
Besides, this produces binary incompatibilities between the
generated code and what the functions in the library expect, even
more trouble...
So, I have no other choice than to parse include files full of
"gccisms" and to make that work, and that starts with figuring out
WHICH include files should I use, hence my question in the subject
line.
The standard is silent on this subject, not even specifying a common
command that all compilers could support, telling the user how the
search paths are built.
That is absolutely correct. It's completely implementation dependent.
--
Les Cargill
Loading...