Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do about validation errors #156

Open
domenic opened this issue May 14, 2020 · 0 comments
Open

What to do about validation errors #156

domenic opened this issue May 14, 2020 · 0 comments

Comments

@domenic
Copy link
Member

domenic commented May 14, 2020

The URL standard is getting more sophisticated about validation errors. See e.g. whatwg/url#406.

There is also the issue where the URL standard contains two different ways of checking URL validity, which are probably not equivalent; see some discussion in whatwg/url#437 (comment). I'll call these "parser validity" and "writing validity", since the latter is derived from the URL writing section.

It would be very neat if this package could provide:

  • Whether a URL is writing-valid. (At least for "absolute-URL-with-fragment string"; maybe relative URLs are out of scope.)
  • Whether a URL is parser-valid, including the name of any validation error(s) triggered during parsing.
  • Surface this information in the live URL viewer.
  • A test corpus of valid/invalid URLs, maybe drawn from urltestdata.json. We should at least have one parser-invalid URL for every named parser validation error, and it'd also be good to have writing-invalid URLs derived from systematically violating every rule in the writing URLs section.
  • Bonus: a fuzzer that generates strings and tests if they are parser-valid but writing-invalid, or parser-invalid but writing-valid.

Right now we have some not-very-well-maintained code to track validation errors when the spec says to do so. It does this by setting urlStateMachine.parseError to true. ("parse error" is the old name for "validation error".) We'd probably want to update this to an array of strings (representing validation error names) that get pushed. And we should probably expose it on the URL record; that's #61. Or maybe elsewhere, since if you get a parsing failure, there's no URL record, so any fatal validation errors would be invisible this way?

We have some known places where we miss validation errors that the spec asks about; I think most of them are due to not implementing "is a URL code point", which #59 tried to address, but got stalled.


One reason why I was ambivalent about #59 is that it's a good bit of code, and potentially performance cost, for validity checking. At least one major consumer of this project, jsdom, prefers speed, and has no use for validity checking. It would also add memory consumption for every URL record, especially if as per the above we go beyond the simple boolean to instead be an array of strings.

On the other hand, there aren't many other consumers of this project for "production" use. Everyone else should probably be using the built-in URL constructor in their environment (either Node or the browser). And we're not exactly tuned for speed anyway; there is probably lower-hanging fruit we're ignoring.

We could address this in a number of ways, e.g.:

mattclough1 pushed a commit to mattclough1/whatwg-url that referenced this issue Oct 14, 2021
Escape text from custom transformTags functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant