Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of special character in hostname #206

Open
rubycon opened this issue Jan 19, 2024 · 0 comments
Open

Handling of special character in hostname #206

rubycon opened this issue Jan 19, 2024 · 0 comments

Comments

@rubycon
Copy link
Contributor

rubycon commented Jan 19, 2024

What is the issue with the URL Pattern Standard?

According to the Web Platform Test these hostnames should throw a TypeError :

  • bad/hostname
  • bad#hostname
  • bad%hostname
  • bad\:hostname
  • bad\nhostname
  • bad\rhostname
  • bad\thostname

However the validation of hostname rely almost entirely on URL spec's internal basic parser and according to the spec these cases don't throw a TypeError.

After they're passed to the constructor, they go though the initialize steps, are passed to process a URLPatternInit but not validated because they're patterns. Then they're passed to compile a component with the canonicalize a hostname callback and finally to the basic URL parser with an empty URL Record and state override to hostname state.

  • bad\nhostname, bad\rhostname, bad\thostname: The basic URL parser strip all tabs and newline before processing the input

    2. If input contains any ASCII tab or newline, invalid-URL-unit validation error.

    3. Remove all ASCII tab or newline from input.

    So these 3 strings will be treated as badhostname and no error will be thrown. However a non failing invalid-URL-unit validation error will occur. This behaviour is consistent with the external URL API (e.g. new URL("http://bad\nhostname") is OK).

  • bad/hostname and bad#hostname: The URL parser will stop processing the input after the special character and return only bad which is safely validated.

    3. Otherwise, if one of the following is true:

    • c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#)
    • url is special and c is U+005C (\)

    bad?hostname fails in the pattern parser which expect the ? modifier to be the last character.

  • bad\:hostname: The : char is escaped in the pattern parser and bad:hostname is passed to the URL parser. When the parser encounter the : char with a hostname state state override it returns without processing any hostname.

    2. Otherwise, if c is U+003A (:) and insideBrackets is false, then:

    2. If state override is given and state override is hostname state, then return.

    After returning the hostname is null and the code later fail on an assertion when running generate a regular expression and name list.
    This case looks more like an URL spec issue, it is not consistent with the handling of the /, ? and # delimiters.

  • bad%hostname: The hostname is fully parsed by the URL parser and passed to the host parser as an opaque URL. The % is allow in opaque url but only for percent encoded values, so a non failing invalid-URL-unit validation error occur.

    3. If input contains a U+0025 (%) and the two code points following it are not ASCII hex digits, invalid-URL-unit validation error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant