s***@casperkitty.com
2016-03-30 16:06:35 UTC
One strange quirk of the C Standard is that the parsing rules which allow
plus and minus signs within numeric tokens don't distinguish between those
that start with 0x and those which don't. As a consequence, something like
0x1e+1 is not treated by the Standard as a series of three tokens, but rather
as one invalid token.
I would propose that the rule for token binding be changed so that a token
which starts with "0x" will be terminated by a plus or a minus sign, even
if the last character of the token was an "e" or "E" or, alternatively, that
when a token of the form 0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be
decomposed into two or three tokens. These behaviors would be almost
equivalent, but for some corner cases involving macros which would be unlikely
to arise outside of contrived situations but should be acknowledged if not
specified [to minimize the burden on compilers, I think it would probably be
best to explicitly say that a compiler would be allowed to process something
like
#define paste(x,y) x##y
paste(0x,1e+1)
as though it generated distinct tokens 0x1e + 1, but a compiler would also
be allowed to treat the output of paste as though it were a single token
0x1e+1 which would be invalid for most purposes.]
I can't think of any realistic scenarios where production code might
plausibly rely upon the mandated token-binding behavior, but I'm not as
creative with the preprocessor as some people, so there may be useful
cases I'm not aware of.
plus and minus signs within numeric tokens don't distinguish between those
that start with 0x and those which don't. As a consequence, something like
0x1e+1 is not treated by the Standard as a series of three tokens, but rather
as one invalid token.
I would propose that the rule for token binding be changed so that a token
which starts with "0x" will be terminated by a plus or a minus sign, even
if the last character of the token was an "e" or "E" or, alternatively, that
when a token of the form 0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be
decomposed into two or three tokens. These behaviors would be almost
equivalent, but for some corner cases involving macros which would be unlikely
to arise outside of contrived situations but should be acknowledged if not
specified [to minimize the burden on compilers, I think it would probably be
best to explicitly say that a compiler would be allowed to process something
like
#define paste(x,y) x##y
paste(0x,1e+1)
as though it generated distinct tokens 0x1e + 1, but a compiler would also
be allowed to treat the output of paste as though it were a single token
0x1e+1 which would be invalid for most purposes.]
I can't think of any realistic scenarios where production code might
plausibly rely upon the mandated token-binding behavior, but I'm not as
creative with the preprocessor as some people, so there may be useful
cases I'm not aware of.