Post by bartcPost by Thiago AdamsPost by Thiago AdamsI do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.
So it sounds like macro-calls need to be well-formed, but how about
a +
#define a x
a + a;
The current PP rules say that this now becomes a+x+x, but a typical AST
(add2 (add1 a1 a2) a3)
where does the #define go? In the source, it's just after add1 so would
(add2 (add1 (#define 'a' 'x') a1 a2) a3)
but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.
Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
The AST nodes (as I am doing today) didn't change.
But each node have a begin-list and end-list that can be used
to keep #define , comments, #undef that where previously collected
by the scanner.
So, in this case, the preprocessor '#define a x' can be added
at the end-list of a1 node or at the begging-list of a2 node.
a + //a1
#define a x
a + //a2
a; //a3
The decision if they will be collected or not,
or if they will generate warning or error is delegated
to parser.
One suggestion is allow it at the same places where _Static_assert can
go, but this is not a problem. Just more checks at
each grammar production. If I want to regenerated /*comments*/ I
will have to check everywhere.
When I generate code I place the begin-list before the
node and end-list after.
This sample can be re-generated as it is.
a +
#define a x
a + a;
When the primary-expression a (a2) is the current token 'x' the
parser will understand that this is the begging of the expansion of
macro 'a'. When the the token '+' is the current token the
parser will know that the expansion of 'a' ended.
But it ended at the exact point where the primary expression
ended. So I can decide to replace that primary expression by
the macro call or do nothing.
This one
#define X a +
int main()
{
int a;
X 1;
}
Will generate
#define X a +
int main()
{
int a;
a + 1;
}
Because the macro expansion of X didn't ended at the
same point of primary-expression. I can decide where to
put these rules. My current rule is inside the
primary-expression and initializers.
For my personal use, I don´t want to allow this kind of macro
expansion or I don´t care if the generated code is not similar.
For the keywords I need a similar decision. I have to decide if
I will keep the macro or keyword. The good sample for this
is bool.
I am not checking anything at inner macro expansions.
#define NULL ((void*)0)
#define X 1 + NULL
int main()
{
int a;
a + X;
}
is expanded to
int main()
{
int a;
//results (( void*)0) instead of NULL because NULL
//is not recognized at inner expansions
a+1+(( void*)0);
}
Changing
#define X 1 + NULL
To
#define X (1 + NULL)
generates:
a+X;
because now
#define X (1 + NULL)
works as a primary-expression. The inner expansions are not relevant.
In my code the first macro expansion calls other algorithm
that does all the inner expansions and returns a string.
This is string is pushed to the scanner similar of
one #include.