Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to specify dependency on unspecified behavior #6

Open
mycoboco opened this issue Nov 16, 2017 · 6 comments
Open

Need to specify dependency on unspecified behavior #6

mycoboco opened this issue Nov 16, 2017 · 6 comments

Comments

@mycoboco
Copy link

It seems to necessary to specify that the code to enable recursive macro expansion is not conforming to the C Standard.

FOREVER_INDIRECT in the wiki page is defined not to have in it parentheses to invoke FOREVER because that would stop further expansion of FOREVER_INDIRECT. This is, however, is an unspecified behavior and a standard-conforming compiler is free to stop the expansion; refer to DR017Q19 for details.

The reason that most compilers drop the FOREVER_INDIRECT's context when brining parentheses from the outside of its replacement list is that doing so is easier than to keep it.

I think this should be described in the wiki page.

Thanks.

@pfultz2
Copy link
Owner

pfultz2 commented Nov 16, 2017

No its not. To interpret the following:

#define f(a) a*g
#define g(a) f(a)
f(2)(9) 

To expand as 2*f(9) would require a recursive model of expansion which would not be conforming. The standard clearly implies an iterative model(such as a DAG) as a macro invocation is replaced by the preprocessing tokens of the replacement list, not by the rescanned (i.e. macro-expanded) preprocessing tokens of the replacement list.

See 6.10.3/9:

... that causes each subsequent instance of the macro name to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. The replacement list is then rescanned for more macro names as specified below.

And 6.10.3/10:

... Each subsequent instance of the function-like macro name followed by a ( as the next preprocessing token introduces the sequence of preprocessing tokens that is replaced by the replacement list in the definition.

Furthermore, an implementation can't consider possible future invocation by looking ahead for a ( token as the rest of the preprocessing
tokens are not to be considered from 6.10.3.4/2:

If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the file's preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced it is not replaced.

DR017 is an old DR that seems to ignore the original intent of the standard text which was derived from Dave Prosser's pseudo-algorithm with the goal of allowing as much expansion as possible while avoiding infinite recursion. The similar C++ DR268 which resulted from a discussion with Dave Prosser shows that it should expand to 2*9*g:

The original intent of the J11 committee in this text was that the result should be 42, as demonstrated by the original pseudo-code description of the replacement algorithm provided by Dave Prosser, its author.

Also, with n3882 this behavior could be further clarified in future versions of C++.

@mycoboco
Copy link
Author

mycoboco commented Nov 17, 2017

I am not talking about a proposed text for the Standard, but about the text of the Standard.

Even if I read the Dave Prosser's pseudo code after I wrote my preprocessor, I already had known of it and its intent. The truth is that the committee decided to endorse an implementation that does not follow the intent by not endorsing code that depends on the unspecified behavior. With the current state of the Standard, nothing prevents an implementation that generates 2*f(9) from being called conforming.

DR017Q19 and other questions and proposals some of which you cited above arise repeatedly because the current interpretation of the Standard allows ambiguity, not because it carries some unintented wording. The authoritative response to DR017Q19 and the informative text added in an annex by C90 COR and kept by the later revisions also agree with that.

The fact that some of the committee documents for proposals contain something does not mean they will make their way into the normative text of the Standard. For example, there were several proposals to resolve the trailing comma problem of variadic macros (including gcc's , ## __VA_ARGS__ extension), none of them, however, succeeded to make a way into the standard. I know a solution using __VA_OPT__ is gaining some interest and major compilers start to implement it, but we cannot consider the feature to be a standard one until it is specified by the normative text of the future standard.

I totally agree that most compilers have no problem with your code, because whatever model a preprocessor hires (iterative one vs. recursive one), forgetting an expansion context as early as possible is easier to implement. There was a comp.std.c discussion where one of former authors of gcc's cpp insisted that the standard preprocessing model is recursive and Doug Gwyn, a member of J11 argued against it. But, as we all know, the cpp from gcc has never stopped at 2*f(9).

@pfultz2
Copy link
Owner

pfultz2 commented Nov 17, 2017

I am not talking about a proposed text for the Standard, but about the text of the Standard.

I am talking about the text of the standard as well. The purpose of the proposals are only to clarify the current text not to define new behavior.

Even if I read the Dave Prosser's pseudo code after I wrote my preprocessor, I already had known of it and its intent. The truth is that the committee decided to endorse an implementation that does not follow the intent by not endorsing code that depends on the unspecified behavior.

The committee chose to use a simpler method for the standard text to achieve the same results(ie intent). The original intent is very much relevant when the text becomes unclear.

With the current state of the Standard, nothing prevents an implementation that generates 2*f(9) from being called conforming.

This is incorrect. The invocation of g is not a nested invocation of f because there is no left parenthesis found during the rescanning of f's replacement list. Scanning must continue out of g's replacement list in order to find the left parenthesis and is therefore a macro invocation, which means a conforming preprocessor according to the text of the standard must continue with replacements producing 2*9*g.

There were a comp.std.c discussion where one of former authors of gcc's cpp insisted that the standard preprocessing model is recursive and Doug Gwyn, a member of J11 argued against it.

I think you have that backwards. Doug Gwyn has argued that the PP model is recursive even though that is not what the current standard specifies nor the original intent(which he was not part of the original implementation).

@mycoboco
Copy link
Author

mycoboco commented Nov 17, 2017

If you think the purpose of the proposals is for clarification, then they are actually trying to change the text that the committee intentionally left ambiguous.

If you think the committee chose to use a simpler method for delivering the intent (which was described by Dave Prosser's pseudo code), how can we explain the item added in an annex, which allows an implementation to generate 2*f(9)? And what about the related explanation in C90/C99 Rationale? Should we consider that those documents published by the committee to explain the intended interpretation have no significance while proposals that can contain anything its author wants do?

This is incorrect. The invocation of g is not a nested invocation of f because there is no left parenthesis found during the rescanning of f's replacement list. Scanning must continue out of g's replacement list in order to find the left parenthesis and is therefore a macro invocation, which means a conforming preprocessor according to the text of the standard must continue with replacements producing 29g.

The committee's interpretation and the informative text that must be consistent with the normative part clearly disagree with your interpretation. I am afraid that this kind of discussion is not quite meaningful here because it has been done many times on comp.std.c and, unfortunately I have not enough time for it. If you really think your interpretation is correct, I suggest to start a discussion on comp.std.c.

I think you have that backwards. Doug Gwyn has argued that the PP model is recursive even though that is not what the current standard specifies nor the original intent(which he was not part of the original implementation).

I should have said IIRC ;-) A recursive approach has no problem (and work better in certain cases) provided that it meets all the requirements of the standard. And Dave Prosser already recognized that the drafted wording for macro rescanning did not completely mimic his algorithm:

It defines a behavior for the two currently unspecified parts of the Standard’s macro expansion process. Be assured that these two parts only come into play when the expansion process is abused (when varying hide sets are intermixed) and as such do not have any effect on ‘‘real’’ programs.
...
The intersection operation above and the one below in the glue function are both one possible
implementation choice that the draft leaves unspecified. This algorithm chose the intersection operation as it gives the most amount of macro replacement, still without causing infinite loops, etc.

And this agrees with Doug Gwyn's recall:

The "original intent" is a red herring; in fact the C
Standard did not spell out the model with the clarity
of an algorithm such as Dave's, alas, and (although I'd
have to rummage through notes long buried) there is no
particular reason to think that the text of the standard
as eventually negotiated and published was or was not
meant to prcisely mirror Dave's algorithm. In fact I
rather recall that we did make some adjustments, in the
belief that the very examples under discussion could
reasonably be done either way, and we didn't want to
precude either form of implementation.

@pfultz2
Copy link
Owner

pfultz2 commented Nov 17, 2017

If you think the committee chose to use a simpler method for delivering the intent (which was described by Dave Prosser's pseudo code), how can we explain the item added in an annex, which allows an implementation to generate 2*f(9)?

It seems they disregarded both the original intent and text of the standard. Perhaps at the time they thought such an implementation could be conforming, but its not. Even decades later there has yet to be a conforming preprocessor to implement it this way.

The committee's interpretation and the informative text that must be consistent with the normative part clearly disagree with your interpretation.

No, the DR17Q19 merely states that such an implementation could be allowed, however, there no interpretation of the text that could allow such an implementation.

A recursive approach has no problem (and work better in certain cases) provided that it meets all the requirements of the standard.

Yes, provided it meets the requirements of the standard, which it cannot. The standard makes it clear that replacement must happen separately before rescanning. So recursively applying replacements and then rescanning is non-conforming.

@mycoboco
Copy link
Author

mycoboco commented Nov 18, 2017

It seems they disregarded both the original intent ...

The goal of the C90 Committee to write wording for preprocessing was not to move Dave's algorithm into plain English without any change. So, I think we need to separate the Committee's intent from Dave Prosser's code and to stop calling the code as the original intent, because we have:

  • more than one committee members who said the intent was to allow the unspecified behavior;
  • the writer of the code who explicitly said that the wording in the C90 differed from his code in allowing implementation choices;
  • a Committee response to the relevant DR that confirmed what the intent was; and
  • the annex item (with a minor modification) that survived several revisions since C90.

I carefully avoid saying that the Standard's wording should be read as to accommodate implementation choices because I think that is the point where we disagree.

but its not ...
there no interpretation of the text that could allow such an implementation
which it cannot. The standard makes it clear that replacement must happen separately before rescanning. So recursively applying replacements and then rescanning is non-conforming.

Your point seems that the literal reading of the Standard is not to allow conforming preprocessors to keep the blue-painted set when bringing an open parenthesis beyond some boundaries. I could argue that a reading of text gives a different interpretation, but, again I think this, a github issue page is not a proper place to discuss it. I learned from my experiences on comp.std.c that this kind of discussions takes much time to figure out each others' understanding and misunderstanding, which is why I suggest to open a new discussion on the newsgroup to see how others read the text.

The text of the Standard is written to deliver the intent of the Committee and a DR should be filed when it fails. You insist that the text in question doesn't allow implementation choices when handling blue-painted tokens, which implies a DR to change the text or their intent should be in order, and as a practical user (as opposed to a theoretical one) of the Standard, I think taking the intent behind the wording (as opposed to taking its literal reading) is a proper approach until the DR gets a proper response.

Your ideas for implementing detection and iteration with preprocessing features are interesting, and I don't think adding a note about the Committee's intent for the behavior on which your code depends reduces fun of reading the code. Or you are free to leave it as unspecified, as the Committee does, if you don't care whether or not people ignorant of this story mistake it is unarguably safe to depend on the behavior in their code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants