Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web platform tests disagree on how to handle blob content-type validation #189

Open
wisniewskit opened this issue Apr 26, 2024 · 14 comments

Comments

@wisniewskit
Copy link

What is the issue with the MIME Sniffing Standard?

While trying to pass all of mimesniff/mime-types/parsing.any.html in Firefox, I've noticed that as I start passing more of the tests there by sending the content-type through our standards-compliant MIME type parser, other tests suddenly fail in FileAPI/file/File-constructor.any.html, such as:

  FAIL Valid contentType ("text/plain;charset = UTF-8") - assert_equals: expected "text/plain;charset = utf-8" but got "text/plain"

Furthermore, the current File API specs 1 2 3 don't mention passing the content type through a MIME validator, meaning that a lot of the tests in parsing.any.html are invalid, because they necessarily construct Blobs or Files (also the tests for Request/Response use blob() as well).

I'm not sure how to rectify this. The most obvious alternatives are of course to either:

  1. change the Blob constructor spec to be more strict (but I'm not sure if this is risky).
  2. change the parsing.any.js expectations for the Blob cases to expect blob-standard content type parsing.

@annevk , what do you think? I recall conversations about this in the past, but don't remember if option 1 is feasible. Would #2 be the way to go in that case, or should we defer this for now? (I'm open to just having Firefox match Chrome and Safari as closely as possible for now, if we're not ready to tackle this anytime soon).

@annevk
Copy link
Member

annevk commented Apr 26, 2024

Thank you for raising this.

I think we should align those tests as per the discussion in w3c/FileAPI#43. We should really also change the File API specification.

I think you are correct though that there is a risk here. I hope it's small enough that this is a change we can successfully make as the current File API specification is rather broken (lowercasing MIME type parameter values is just flat out wrong for instance).

@wisniewskit
Copy link
Author

That sounds fine to me, but how do you feel we should coordinate this? Should we change the spec, then the WPTs, then the implementations? Are there any other folks we should ping to get their input first? Or would it be best to have an implementation try changing to pass 100% of the parsing.any tests first, before we do any of that?

@annevk
Copy link
Member

annevk commented Apr 26, 2024

If you're prepared to work on this, I think at any point we can align the WPTs with the agreement reached in w3c/FileAPI#43 (comment) and mimesniff/mime-types/parsing.any.html which are based on that agreement. The specification and implementations should be updated as well of course. I don't think the order particularly matters. Hope that helps!

cc @mkruisselbrink @pwnall

@wisniewskit
Copy link
Author

wisniewskit commented Apr 28, 2024

Just FYI, I've managed to make a patch which would get Firefox passing the parsing.any WPT, which let me discover which other WPTs would need to be updated because of the parse+serialize round-trip changing the expected results (either because they expect a space to be preserved after the semicolon in a blob content-type, or lowercasing to be done, or expect new Blob(x, { type: null }) to stringify the type to null rather than an empty string):

  • FileAPI/blob/Blob-constructor.any.js
  • FileAPI/blob/Blob-slice.any.js
  • FileAPI/file/File-constructor.any.js
  • clipboard-apis/clipboard-item.https.html
  • fetch/api/request/request-init-contenttype.any.js
  • fetch/api/response/response-consume.html
  • fetch/api/response/response-init-contenttype.any.js
  • xhr/setrequestheader-content-type.htm

Since this would of course mean that Chrome and Safari will suddenly start failing all of those WPTs if I land this patch, I thought I would mention it here before I try to do so. If we feel this is a change better delayed to interop2025 (for instance), then I'm alright with that.

I think in the meantime I could change Firefox to basically match Chrome and Safari's behavior on the pass/fails on the test, to at least have something relatively close to interop for now, until other vendors are ready to deal with this.

@annevk
Copy link
Member

annevk commented Apr 29, 2024

I'm not really involved in Interop, but it seems good to have those tests match the actual requirements and it seems like there's still quite a bit of time left for 2024.

@wisniewskit
Copy link
Author

@annevk , I've just uncovered a possible issue with using Extract-a-MimeType on a Blob's content-type.

Consider https://fetch.spec.whatwg.org/#ref-for-mime-type-essence

What if we try to navigator.sendBeacon a Blob with the content-type of application/json, text/plain? Right now, browsers will disallow it, because the full content-type "as a string" is not allowed.

But if we Extract-a-MimeType, the content-type becomes text/plain, which is allowed.

So should we allow that blob to be sent in our brave new world? (the warning on the spec page would imply no).

Assuming we should not, how else should we process the content-type of blobs? (Or am I overthinking this?)

Note that WPTs like resource-timing/content-type-parsing.html expect Extract-a-MimeType to be used to handle commas (which browsers are of course not passing at the moment).

@wisniewskit
Copy link
Author

@annevk, wait... if I use Extract-a-Mime-Type on the blob's content-type, then I start failing tests in parsing.any.html as well..

  FAIL x/x;,=x;bonus=x (Request/Response) - assert_equals: expected "x/x;bonus=x" but got "x/x"
  FAIL x/x;x=,;bonus=x (Blob/File) - assert_equals: Blob expected "x/x;x=\",\";bonus=x" but got "x/x"

@annevk
Copy link
Member

annevk commented May 2, 2024

Can you give a more complete scenario? Extract a MIME type should only be used where it is defined to be used. Note that for sendBeacon() for instance it uses the result of that operation as the input for the Content-Type header (though it forgets to serialize). That means it's safe.

For the Blob constructor however I would not expect that algorithm to be used. I would expect us to use the MIME type parser directly.

@wisniewskit
Copy link
Author

wisniewskit commented May 2, 2024

Ah, ok, so the Blob constructor/slice should just use the MIME parser and serialize, that makes sense.

In that case the tests will just need to adjust for that expectation (if they create a blob with type application/json, text/plain, then they should expect the blob to end up with an empty type).

Thanks, glad to hear I was overthinking it.

@domenic
Copy link
Member

domenic commented May 7, 2024

My understanding of this issue and the history is as follows:

  • The File API spec for Blob is not clear, and not well-maintained.
    • It's current behavior is to accept strings as-is, unless they contain characters outside U+0020 to U+007E, in which case we return empty.
    • Per Implementations allow all values in type getter w3c/FileAPI#43, implementations instead accept any string
    • In no cases does the MIME type get parsed or normalized
    • At some point, someone landed tests that don't match the spec, without updating the spec
  • Fetch's blob() spec is slightly more clear and well-maintained, but not completely
    • It attempts to initialize Blob's type attribute, which is a string, with the result of "get the MIME type", which is a MIME type struct. Type error.
    • But you can kind of assume that a normalization step is supposed to happen here.
    • Based on WPT results, no implementation appears to do this sort of round-tripping
    • I'm unsure if this was always specced to include a normalization step, and implementations never implemented it; or if we updated Fetch to include a normalization step at some point, and forgot to get implementer agreement; or if we updated Fetch to include a normalization step at some point, and implementers agreed in theory, but haven't prioritized the work.

To move this issue forward, I think we need a clear multi-implementer decision on which path to take:

  • No normalization
    • Remove the code point range requirement from the Blob constructor
    • Make requestOrResponse.blob() just use the raw Content-Type string instead of the result of "get the MIME type"
    • Update all the tests (and unfortunately lose all test coverage for the MIME type parser...)
  • Yes normalization

It would help to get a full list of impacted APIs. We know new Blob(...).type and requestOrResponse.blob() are impacted. Based on #189 (comment) it sounds like there might be more?

If someone else is willing to get a full list of APIs, ideally each with small 1-2 line code examples of before/after the change, then I can try to get an official Chromium position. I guess relevant people from WebKit and Mozilla are already on this thread :).

@annevk
Copy link
Member

annevk commented May 7, 2024

@domenic the test situation for Blob was discussed in w3c/FileAPI#43 (comment) with agreement from Chromium engineers at the time. Of course, that's a long time ago now, but that general plan still seems good.

@domenic
Copy link
Member

domenic commented May 7, 2024

Yeah, that engineer no longer works on Chromium, so I think re-affirming the position (with the additional data that 7 years have passed with no change) would be good.

@wisniewskit
Copy link
Author

@domenic , I'm afraid that all I can realistically commit to on my time-budget is:

  • finding out which WPTs will need to be changed to follow the parse+serialize approach (at least for tests where Firefox gets far enough to figure that out), and prepare a patch for that.
  • pushing to try gradually shipping this in Firefox, to determine if there is any fallout on actual websites for Firefox users.

If it seems web-compatible to do this, then I would hope it's a compelling enough argument for others to help move the needle here as well. And if it's not web-compatible then "no normalization" would seem to be the default path we'll have to take.

@domenic
Copy link
Member

domenic commented May 8, 2024

Understood. I guess we'll leave it with no Chromium position / promise to change then, but maybe WebKit and Gecko can ship the change and pave the way for Chromium to eventually follow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants