Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of base URL in URLPatternInit processing #202

Closed
rubycon opened this issue Jan 12, 2024 · 10 comments
Closed

Incorrect handling of base URL in URLPatternInit processing #202

rubycon opened this issue Jan 12, 2024 · 10 comments

Comments

@rubycon
Copy link
Contributor

rubycon commented Jan 12, 2024

What is the issue with the URL Pattern Standard?

A correct implementation of the current spec fail with this kind of simple input:

new URLPattern({ pathname: '/books/:id', baseURL: 'https://example.com' });

The issue comes in part from the use of URL spec's internal parsers. These parsers manipulate URL records with some fields (host, port, query, fragment) that default to null when empty.

When those empty fields are then used to fill the default url pattern (step 11-7 to 11-10 of process a URLPatternInit), the processing a base URL string assertion fail (Assert: input is not null.)

In the special case of the baseURL port, URL record use unsigned integers and not strings, so the assertion pass but the code fail further down in escape a pattern string (1. Assert: input is an ASCII string.).

As per Web IDL, implicit string conversion would yield "null" for those empty field (as the URL record struct is not marked [LegacyNullToEmptyString]) which is not what we want.

The spec should check if a baseURL field is not null before using it to fill the url pattern. As for the port it should be converted to a string but is the processing of the stringified port really necessary as we are certain its only ASCII characters and there's nothing to escape ?).

@rubycon rubycon changed the title Incorrect handling of baseURL in URLPatternInit processing Incorrect handling of base URL in URLPatternInit processing Jan 12, 2024
@jeremyroman
Copy link
Collaborator

I don't think WebIDL conversion is applicable here, but nonetheless I agree that we need to replace null with an empty string in a few places here.

jeremyroman added a commit to jeremyroman/urlpattern that referenced this issue Jan 12, 2024
Some of these values may be null (or in the case of port, an integer)
and must be converted to strings before they can be applied to
URLPatternInit.

Fixes whatwg#202.
jeremyroman added a commit to jeremyroman/urlpattern that referenced this issue Jan 12, 2024
Some of these values may be null (or in the case of port, an integer)
and must be converted to strings before they can be applied to
URLPatternInit.

Fixes whatwg#202.
@rubycon
Copy link
Contributor Author

rubycon commented Jan 12, 2024

@jeremyroman There's somewhat a related bug when creating a pattern parser with some of the encoding callbacks.

These callbacks also rely on URL spec's basic parser and do not check for null value with the exception of the port canonicalizing callback but as with the base URL the integer port should also be serialized or it fail the first escape a regexp string assertion.

As I understand these callbacks should always return a string, as all patterns are string.

@jeremyroman
Copy link
Collaborator

Acknowledged on the port needing to be serialized there. The others seem fine to me, as they initialize the URL record's relevant field into a non-null state and then proceed with parsing from a state that I think will never set them back to null. Can you be more specific?

@rubycon
Copy link
Contributor Author

rubycon commented Jan 16, 2024

@jeremyroman Yes, it seems fine for the other components as null is always replaced by "*" during the component compilation.

But should processedInit fields be set to null in the first place when they must be USVString . If it's OK to do so for internal structs that are not exposed through the API I think it shouldn't be an issue.

This issue is also discussed here #204 (comment)

jeremyroman added a commit that referenced this issue Jan 16, 2024
Some of these values may be null (or in the case of port, an integer) and
must be converted to strings before they can be applied to URLPatternInit.

The conversion of the port to an integer is also necessary in the encoding
callbacks.

Addresses #202, at least in part.
@jeremyroman
Copy link
Collaborator

Well, it's not that they must be USVString, it's that they must be USVString if they are present. Web IDL dictionaries are ordered maps and the dictionary members are only required keys if they are specified as required.

It is incorrect to initialize these to null; instead, we should be conditionally assigning them and leaving them absent from the map otherwise.

jeremyroman added a commit that referenced this issue Jan 30, 2024
It is not valid for these dictionary members to be null, since they are defined to, if present, hold a USVString value. Instead, these should be omitted from the result dictionary altogether if no string was provided to this algorithm.

This addresses the concern raised in #202 (comment).
@rubycon
Copy link
Contributor Author

rubycon commented Feb 11, 2024

Hi @jeremyroman, hostnames seems to suffer from the same issue. The basic URL parser will parse a host as a hostname or an IP address. Addresses can be represented as an single integer or as an array of integer, so they also need to be explicitly serialized, here :

To canonicalize a hostname given a string value

  • 5. Return dummyURL’s host.

The parsed host and port in the match steps will also need to be properly serialized :

To perform a match given a URLPattern urlpattern, a URLPatternInput input, and an optional string baseURLString:

  • 12. Otherwise:
    • 8. Set hostname to url’s host or the empty string if the value is null.
    • 9. Set port to url’s port or the empty string if the value is null.

@rubycon
Copy link
Contributor Author

rubycon commented Feb 11, 2024

@jeremyroman Also: minor typo found in canonicalize a port. Argument passed as portValue and then referred as value.

@rubycon
Copy link
Contributor Author

rubycon commented Apr 17, 2024

@jeremyroman @annevk I submitted a PR to fix this issue + the minor typo but it seems it's still pending.
I applied Jeremy recommendations to my commit message and submitted a participation agreement.

I'm new to this process, is there something else I'm missing ?

jeremyroman pushed a commit that referenced this issue Apr 19, 2024
The URL parser return port as an integer and host as either a hostname string, an integer (IPv4 address) or an array of integers (IPv6 address). For further processing by the URLPattern code these properties need to be properly converted to string.

The issue was discussed here: #202
@jeremyroman
Copy link
Collaborator

@rubycon did your PR fully resolve this issue? (i.e., shall I close it now?)

@rubycon
Copy link
Contributor Author

rubycon commented May 16, 2024

@jeremyroman Yes, it's fully resolved!

@annevk annevk closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants