Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Permit more control characters in comments (#924)" #996

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

arp242
Copy link
Contributor

@arp242 arp242 commented Oct 1, 2023

This reverts commit ab74958.

I'm a simple guy. Someone reports a problem, I drink coffee and fix it. No one reports a problem? There is nothing to fix and I go drink beer.

No one really reported this as a problem, but it does introduce needless churn for all TOML implementations and the test suite. Do we need to forbid anything in comments? Probably not, and in strings we probably only need to forbid \x00. But at least before it was consistent with strings, and more importantly, what everyone wrote code for, which is tested, and already works.

None of the hypotheticals on why this is "needed" are practical issues people reported, and most aren't even fixed: a comment can still invalidate the file, you must still parse each character in a comment as some are still forbidden, the performance benefits are very close to zero they might as well be zero, and you still can't "dump whatever you like" in comments.

So it doesn't actually change anything, it just changes "disallow this set of control characters" to ... "disallow this set of control characters" (but for a different set). That's not really a substantial or meaningful change. The only (minor) real-world issue that was reported (from the person doing the Java implementation) was that "it's substantially more complicated to parse out control characters in comments and raise an error, and this kind of strictness provides no real advantage to users". And that's not addressed at all with this, so...


And while I'm at it, let me have a complaint about how this was merged:

  1. Two people, both of whom actually maintain implementations, say they don't like this change.
  2. This is basically ignored.
  3. Three people (none of whom maintain an implementation, AFAIK) continue written a fairly large number of large comments, so anyone who wasn't already interested in this change unsubscribes and/or goes 🤷
  4. "Consensus".

Sometimes I feel TOML attracts people who like to argue things from a mile-high ivory tower with abstract arguments that have only passing familiarity with any actual pragmatic reality.

Fixes #995

This reverts commit ab74958.

I'm a simple guy. Someone reports a problem, I drink coffee and fix it. No one reports a problem? There is nothing to fix and I go drink beer.

No one really reported this as a problem, but it *does* introduce needless churn for all TOML implementations and the test suite. Do we need to forbid *anything* in comments? Probably not, and in strings we probably only need to forbid \x00. But at least before it was consistent with strings, and more importantly, what everyone wrote code for, which is tested, and already works.

[None of the hypotheticals](toml-lang#567 (comment)) on why this is "needed" are practical issues people reported, and most aren't even fixed: a comment can still invalidate the file, you must still parse each character in a comment as some are still forbidden, the performance benefits are very close to zero they might as well be zero, and you still can't "dump whatever you like" in comments.

So it doesn't *actually* change anything, it just changes "disallow this set of control characters" to ... "disallow this set of control characters" (but for a different set). That's not really a substantial or meaningful change. The only (minor) real-world issue that was reported (from the person doing the Java implementation) was that "it's substantially more complicated to parse out control characters in comments and raise an error, and this kind of strictness provides no real advantage to users". And that's not addressed at all with this, so...

---

And while I'm at it, let me have a complaint about how this was merged:

1. Two people, both of whom actually maintain implementations, say they don't like this change.
2. This is basically ignored.
3. Three people continue written a fairly large number of large comments, so anyone who wasn't already interested in this change unsubscribes and/or goes 🤷
4. "Consensus".

Sometimes I feel TOML attracts people who like to argue things from a mile-high ivory tower with abstract arguments that have only passing familiarity with any actual pragmatic reality.

Fixes toml-lang#995
@ChristianSi
Copy link
Contributor

ChristianSi commented Oct 2, 2023

No one really reported this as a problem

I'd consider @abelbraaksma comment that triggered the whole change indeed a bug report, even though it's not a bug he had run into personally. He got 4 thumps-ups, and in the whole discussion that followed in that issue not a single person argued that the restrictive 1.0 status (where comments must fulfill most string requirements, despite not really being strings) should be kept. Plus, though you say that implementation maintainers didn't like this, the very next comment says:

As an implementation author, I agree with @abelbraaksma above.

As for the later discussion in #924, I suppose you refer to yourself (#924 (comment)) and @hukkin (#924 (comment)) expressing doubts? While the latter talked about "potential busywork for implementation maintainers", their main concern seem to have been about "about allowing lone CR in comments" – and this was subsequently addressed by keeping them indeed forbidden. Plus another maintainer, @marzer, explicitly welcomed the change (#924 (comment): "status quo bad").

And you yourself write above:

Do we need to forbid anything in comments? Probably not...

Arguably, the current main, which you wish to revert, comes closer to that ideal ("forbid as little as possible in comments"), while the old 1.0 standard is to forbid a lot, without anyone seeming to know exactly why.

Also I'm a bit concerned by your suggestion that TOML maintainers are more important than other people here, that is, TOML users. Now admittedly, maintainers are essential, since without them there would be nothing to use. But users are important too, since without them there would be no point in maintaining anything. So I'd suggest not to play them out against each other.

Plus there is the workload of future maintainers, of all the TOML implementations that don't exist yet but will hopefully be written some day. I'd say that while the difference is not large, for them life has become a bit simpler thanks to the accepted change (#924). In 1.0, comments are essentially a fifth string type – not exactly matching any of the four actual string types, but still having similar requirements when it comes to error checking. In the future, error checking in comments is considerably simplified, since you only have to check for CR (if line splitting has already happened) or you have to check for CR or LF to find the end of the comment (if not – with the former, if not followed by LF, signaling an error). That's arguable more simple than having to scan for arbitrary control characters and confusing the user with strange error messages ("illegal comment").

@ChristianSi
Copy link
Contributor

ChristianSi commented Oct 2, 2023

A small correction: Rechecking the ABNF, I see that VT (vertical tab) and FF (form feed) are still forbidden too, apparently due to #924 (comment). That wasn't much discussed at the time. If you (or anybody else here) makes the case that they too should be allowed in comments, to make the code simpler and more consistent, I think I'd be on board with that – if they occur at all, they're unlikely to cause problems, so there doesn't seem to be a particular reason to keep them forbidden.

@arp242
Copy link
Contributor Author

arp242 commented Oct 2, 2023

I'd consider abelbraaksma comment that triggered the whole change indeed a bug report, even though it's not a bug he had run into personally. He got 4 thumps-ups

That comment was never implemented, so those thumbs-up are rather meaningless. cleishm will still have the same problems as well so that's meaningless as well.

But users are important too, since without them there would be no point in maintaining anything. So I'd suggest not to play them out against each other.

My entire point is that for users this change makes zero practical difference.

Plus there is the workload of future maintainers

For future maintainers it's more work, or at the very least the same amount of work. You still need to put in the plumbing which makes this hard (hard-ish anyway) in some cases. Your "small correction" is not "small". It's actually easier with 1.0 because now you can just reject these control characters outright in your tokenizer as they're never valid anywhere (since strings and comments have the same set). This is also what makes this more work than "just" an updated range.

This is the entire problem, really. You all went off to have your own abstract discussions without even consulting the people who raised implementation concerns and came up with a "solution" which didn't even address the actual concerns that were raised, and instead you came up with some middle of the ground thing that has more or less the downsides of everything and the upsides of nothing.

This is also why:

Also I'm a bit concerned by your suggestion that TOML maintainers are more important than other people here

That's because they are, kind of anyway. Free-range no-consequence discussions are easy if you're off-loading the work to others.

If you think it's so important, then why don't you go and fix things? If you had actually put the work then I could have had some respect for it.

Now, I'm willing to listen to anyone, but this just doesn't really change anything for anyone in any meaningful way and the only real-world consequence is churn for implementers, so, you know, if you really want to change it then put in the work? That's really how it works pretty much for any volunteer effort.

@ChristianSi
Copy link
Contributor

ChristianSi commented Oct 2, 2023

You're right that the current spec is a kinda strange middle ground that doesn't make all that much sense, and it's different from what I had remembered. But I think the right way is not backward, but forward by allowing VT and FF in as well. NUL and the line terminator chars CR and LF should remain forbidden, it stands to reason, but nothing else.

I'll see if I can create a PR one of these days.

@ChristianSi
Copy link
Contributor

ChristianSi commented Feb 16, 2024

A careful reader has found out that 1.0 is inconsistent in its rules regarding comments – specifically, the ABNF allows U+007F (DEL), while the written spec forbids it. So "revert to the old status quo" is simply not an option, since the old status quo is buggy and needs to be fixed.

@pradyunsg
Copy link
Member

That was already spotted and fixed (as mentioned in #995's closing comment).

@ChristianSi
Copy link
Contributor

Spotted and fixed, yes, but only if we don't revert. So let's not – the old status quo, since buggy, is simply no longer an option.

@arp242
Copy link
Contributor Author

arp242 commented Apr 12, 2024

That's a simple typographical fix, unrelated to the issues raised here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excluding LF in 'allowed-comment-char' is confusing
3 participants