Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token spacing in preprocessor #24

Open
huangguiyang opened this issue Jan 11, 2017 · 11 comments
Open

Token spacing in preprocessor #24

huangguiyang opened this issue Jan 11, 2017 · 11 comments

Comments

@huangguiyang
Copy link

huangguiyang commented Jan 11, 2017

Macro expansion is a tricky operation, fraught with nasty corner cases. I've tried some compilers (gcc, clang, lcc, tcc, 9cc, wgtcc, 8cc) for below's code snippet. Unfortunately, only gcc, clang and lcc got right.

#define PLUS +
#define EMPTY
#define f(x) =x=
+PLUS -EMPTY- PLUS+ f(=)

The right output is

 + + - - + + = = =

not

++ -- ++ ===
@wgtdkp
Copy link
Owner

wgtdkp commented Jan 11, 2017

I hate space, stringize, glue and back slash!

@wgtdkp
Copy link
Owner

wgtdkp commented Jan 11, 2017

I doubt if both are correct, it is just that the preprocessor's dump function can't generate pretty readable code.

@huangguiyang
Copy link
Author

The preprocessor must handle macro expansion carefully in order to get right column number etc. It's tricky. That explains why early C compilers don't contain column number in diagnostic messages.

@huangguiyang
Copy link
Author

I think they are not identical token stream. + + means two separate + tokens to lexer, and ++ means increment operator to lexer.

@wgtdkp
Copy link
Owner

wgtdkp commented Jan 11, 2017

That right way of checking if the compiler handles column(or space) correctly is try below snippet:

#define PLUS +
#define EMPTY
#define f(x) =x=
#define STRINGIZE(x) #x
#define TEST(x) STRINGIZE(x)
const char* str = TEST(+PLUS -EMPTY- PLUS+ f(=));

str should be initiated by string literal "++ -- ++ ===" which are both generated by wgtcc and gcc.

@huangguiyang
Copy link
Author

huangguiyang commented Jan 11, 2017

It's just one of numerous test cases.

@wgtdkp
Copy link
Owner

wgtdkp commented Jan 11, 2017

The -E option is just for dumping the preprocessed code for programmer. wgtcc 's dump function is so simple that the dumped code can't be compiled to get the same by compiling the .c file. But it can be fixed by simply inserting a space between two tokens.

@huangguiyang
Copy link
Author

huangguiyang commented Jan 11, 2017

Actually, -E option may be used by other compilers. That's why the output must contain the line number information. In early days, the preprocessor runs as a separate pass. Of course, if you don't want your preprocessor to be a stand alone one, that's right.

@wgtdkp
Copy link
Owner

wgtdkp commented Jan 11, 2017

So it is the dump function that should be fixed.
I'd rather go die.

@huangguiyang
Copy link
Author

The key: where is the f**king document that describes these corner cases completely? :-(

@wgtdkp
Copy link
Owner

wgtdkp commented Jan 11, 2017

Lets suicide together :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants