Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module loader does not allow parameters in data: URIs #30488

Open
jbemmel opened this issue Nov 14, 2019 · 6 comments
Open

Module loader does not allow parameters in data: URIs #30488

jbemmel opened this issue Nov 14, 2019 · 6 comments
Labels
blocked PRs that are blocked by other issues or PRs. known limitation Issues that are identified as known limitations.

Comments

@jbemmel
Copy link

jbemmel commented Nov 14, 2019

The regex at https://github.com/nodejs/node/blob/master/lib/internal/modules/esm/translators.js#L42 defines the allowable data: URIs:

const DATA_URL_PATTERN = /^[^/]+\/[^,;]+(;base64)?,([\s\S]*)$/;

This rejects valid RFC2397 data: URIs with media parameters, like
data:text/javascript;charset=utf-8;base64,xxxx

Likewise, a valid URI like
data:,A%20brief%20note
is rejected ( '/' character is required in the media type - even though the media type is not used )

@devsnek
Copy link
Member

devsnek commented Nov 14, 2019

cc @bmeck

iirc adding a compliant parser was blocked for some reason

@bmeck bmeck added blocked PRs that are blocked by other issues or PRs. known limitation Issues that are identified as known limitations. labels Nov 14, 2019
@bmeck
Copy link
Member

bmeck commented Nov 14, 2019

Correct, see #21128

@guybedford
Copy link
Contributor

A more progressive fix here might just be to extend the RegEx to support the specific cases in use here. I understand this is a slippery slope, but sometimes the small steps evolution approach can work best as opposed to blocking this on the larger effort (and both don't approaches aren't mutually exclusive so far as shipping goes - if the better approach is unblocked then that can win out).

A small regex adjustment, something like -

const DATA_URL_PATTERN = /^([^/]+\/[^,;]+)?(;charset=utf-?8)?(;base64)?,([\s\S]*)$/;

may handle these common cases better. An explicit charset list handling seems sensible given that we don't currently support other charsets.

@jbemmel if you'd be interested in working on a small PR to add this feature that would be very welcome.

@jbemmel
Copy link
Author

jbemmel commented Nov 18, 2019

I was about to argue the same point, but my proposal would be something like this:

const DATA_URL_PATTERN = /^([^/]+\/[^,]+?)?(;base64)?,([\s\S]*)$/;

I quoted those examples because they are mentioned in the RFC, and I think NodeJS should at least allow those. However, my use case is slightly different - I would like to use data: URIs to build a http(s) module loader. In that context, I need to retain the URL from which a given data: URI was fetched - so I added a custom 'url' parameter, which gets rejected.
Since neither the media type nor any parameters except the 'base64' flag are referenced by the NodeJS code, it should simply ignore those imho. And if at a later point the media type parser gets added and it starts checking for 'text/javascript' then that will be fine - but for now, a small change could allow valid data: URIs

@bmeck
Copy link
Member

bmeck commented Nov 22, 2019

A PR has been opened to ignore parameters based upon the WHATWG splitting point for data: URLs at #30593 .

@bmeck
Copy link
Member

bmeck commented Nov 26, 2019

This should be fixed in master, please be aware you will need to use URL encoding for anything with a , and the WHATWG MIME parsing recommendation is slightly unclear on decoding of those currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. known limitation Issues that are identified as known limitations.
Projects
None yet
Development

No branches or pull requests

4 participants