Web platform tests disagree on how to handle blob content-type validation #189

wisniewskit · 2024-04-26T00:18:09Z

What is the issue with the MIME Sniffing Standard?

While trying to pass all of mimesniff/mime-types/parsing.any.html in Firefox, I've noticed that as I start passing more of the tests there by sending the content-type through our standards-compliant MIME type parser, other tests suddenly fail in FileAPI/file/File-constructor.any.html, such as:

  FAIL Valid contentType ("text/plain;charset = UTF-8") - assert_equals: expected "text/plain;charset = utf-8" but got "text/plain"

Furthermore, the current File API specs 1 2 3 don't mention passing the content type through a MIME validator, meaning that a lot of the tests in parsing.any.html are invalid, because they necessarily construct Blobs or Files (also the tests for Request/Response use blob() as well).

I'm not sure how to rectify this. The most obvious alternatives are of course to either:

change the Blob constructor spec to be more strict (but I'm not sure if this is risky).
change the parsing.any.js expectations for the Blob cases to expect blob-standard content type parsing.

@annevk , what do you think? I recall conversations about this in the past, but don't remember if option 1 is feasible. Would #2 be the way to go in that case, or should we defer this for now? (I'm open to just having Firefox match Chrome and Safari as closely as possible for now, if we're not ready to tackle this anytime soon).

The text was updated successfully, but these errors were encountered:

annevk · 2024-04-26T09:39:16Z

Thank you for raising this.

I think we should align those tests as per the discussion in w3c/FileAPI#43. We should really also change the File API specification.

I think you are correct though that there is a risk here. I hope it's small enough that this is a change we can successfully make as the current File API specification is rather broken (lowercasing MIME type parameter values is just flat out wrong for instance).

wisniewskit · 2024-04-26T12:45:26Z

That sounds fine to me, but how do you feel we should coordinate this? Should we change the spec, then the WPTs, then the implementations? Are there any other folks we should ping to get their input first? Or would it be best to have an implementation try changing to pass 100% of the parsing.any tests first, before we do any of that?

annevk · 2024-04-26T13:54:37Z

If you're prepared to work on this, I think at any point we can align the WPTs with the agreement reached in w3c/FileAPI#43 (comment) and mimesniff/mime-types/parsing.any.html which are based on that agreement. The specification and implementations should be updated as well of course. I don't think the order particularly matters. Hope that helps!

cc @mkruisselbrink @pwnall

wisniewskit · 2024-04-28T17:14:16Z

Just FYI, I've managed to make a patch which would get Firefox passing the parsing.any WPT, which let me discover which other WPTs would need to be updated because of the parse+serialize round-trip changing the expected results (either because they expect a space to be preserved after the semicolon in a blob content-type, or lowercasing to be done, or expect new Blob(x, { type: null }) to stringify the type to null rather than an empty string):

FileAPI/blob/Blob-constructor.any.js
FileAPI/blob/Blob-slice.any.js
FileAPI/file/File-constructor.any.js
clipboard-apis/clipboard-item.https.html
fetch/api/request/request-init-contenttype.any.js
fetch/api/response/response-consume.html
fetch/api/response/response-init-contenttype.any.js
xhr/setrequestheader-content-type.htm

Since this would of course mean that Chrome and Safari will suddenly start failing all of those WPTs if I land this patch, I thought I would mention it here before I try to do so. If we feel this is a change better delayed to interop2025 (for instance), then I'm alright with that.

I think in the meantime I could change Firefox to basically match Chrome and Safari's behavior on the pass/fails on the test, to at least have something relatively close to interop for now, until other vendors are ready to deal with this.

annevk · 2024-04-29T07:00:34Z

I'm not really involved in Interop, but it seems good to have those tests match the actual requirements and it seems like there's still quite a bit of time left for 2024.

wisniewskit · 2024-05-02T14:29:56Z

@annevk , I've just uncovered a possible issue with using Extract-a-MimeType on a Blob's content-type.

Consider https://fetch.spec.whatwg.org/#ref-for-mime-type-essence

What if we try to navigator.sendBeacon a Blob with the content-type of application/json, text/plain? Right now, browsers will disallow it, because the full content-type "as a string" is not allowed.

But if we Extract-a-MimeType, the content-type becomes text/plain, which is allowed.

So should we allow that blob to be sent in our brave new world? (the warning on the spec page would imply no).

Assuming we should not, how else should we process the content-type of blobs? (Or am I overthinking this?)

Note that WPTs like resource-timing/content-type-parsing.html expect Extract-a-MimeType to be used to handle commas (which browsers are of course not passing at the moment).

wisniewskit · 2024-05-02T15:02:05Z

@annevk, wait... if I use Extract-a-Mime-Type on the blob's content-type, then I start failing tests in parsing.any.html as well..

  FAIL x/x;,=x;bonus=x (Request/Response) - assert_equals: expected "x/x;bonus=x" but got "x/x"
  FAIL x/x;x=,;bonus=x (Blob/File) - assert_equals: Blob expected "x/x;x=\",\";bonus=x" but got "x/x"

annevk · 2024-05-02T15:11:20Z

Can you give a more complete scenario? Extract a MIME type should only be used where it is defined to be used. Note that for sendBeacon() for instance it uses the result of that operation as the input for the Content-Type header (though it forgets to serialize). That means it's safe.

For the Blob constructor however I would not expect that algorithm to be used. I would expect us to use the MIME type parser directly.

wisniewskit · 2024-05-02T15:59:01Z

Ah, ok, so the Blob constructor/slice should just use the MIME parser and serialize, that makes sense.

In that case the tests will just need to adjust for that expectation (if they create a blob with type application/json, text/plain, then they should expect the blob to end up with an empty type).

Thanks, glad to hear I was overthinking it.

domenic · 2024-05-07T13:06:43Z

My understanding of this issue and the history is as follows:

The File API spec for Blob is not clear, and not well-maintained.
- It's current behavior is to accept strings as-is, unless they contain characters outside U+0020 to U+007E, in which case we return empty.
- Per Implementations allow all values in type getter w3c/FileAPI#43, implementations instead accept any string
- In no cases does the MIME type get parsed or normalized
- At some point, someone landed tests that don't match the spec, without updating the spec
Fetch's blob() spec is slightly more clear and well-maintained, but not completely
- It attempts to initialize Blob's type attribute, which is a string, with the result of "get the MIME type", which is a MIME type struct. Type error.
- But you can kind of assume that a normalization step is supposed to happen here.
- Based on WPT results, no implementation appears to do this sort of round-tripping
- I'm unsure if this was always specced to include a normalization step, and implementations never implemented it; or if we updated Fetch to include a normalization step at some point, and forgot to get implementer agreement; or if we updated Fetch to include a normalization step at some point, and implementers agreed in theory, but haven't prioritized the work.

To move this issue forward, I think we need a clear multi-implementer decision on which path to take:

No normalization
- Remove the code point range requirement from the Blob constructor
- Make requestOrResponse.blob() just use the raw Content-Type string instead of the result of "get the MIME type"
- Update all the tests (and unfortunately lose all test coverage for the MIME type parser...)
Yes normalization
- Change the Blob spec to store a MIME type and then serialize it in the type getter
- This will make it easier to fix Fetch's type mismatch
- Update the other tests listed in Web platform tests disagree on how to handle blob content-type validation #189 (comment), and maybe more specs?

It would help to get a full list of impacted APIs. We know new Blob(...).type and requestOrResponse.blob() are impacted. Based on #189 (comment) it sounds like there might be more?

If someone else is willing to get a full list of APIs, ideally each with small 1-2 line code examples of before/after the change, then I can try to get an official Chromium position. I guess relevant people from WebKit and Mozilla are already on this thread :).

annevk · 2024-05-07T13:15:29Z

@domenic the test situation for Blob was discussed in w3c/FileAPI#43 (comment) with agreement from Chromium engineers at the time. Of course, that's a long time ago now, but that general plan still seems good.

domenic · 2024-05-07T23:28:06Z

Yeah, that engineer no longer works on Chromium, so I think re-affirming the position (with the additional data that 7 years have passed with no change) would be good.

wisniewskit · 2024-05-08T01:13:08Z

@domenic , I'm afraid that all I can realistically commit to on my time-budget is:

finding out which WPTs will need to be changed to follow the parse+serialize approach (at least for tests where Firefox gets far enough to figure that out), and prepare a patch for that.
pushing to try gradually shipping this in Firefox, to determine if there is any fallout on actual websites for Firefox users.

If it seems web-compatible to do this, then I would hope it's a compelling enough argument for others to help move the needle here as well. And if it's not web-compatible then "no normalization" would seem to be the default path we'll have to take.

domenic · 2024-05-08T01:49:53Z

Understood. I guess we'll leave it with no Chromium position / promise to change then, but maybe WebKit and Gecko can ship the change and pave the way for Chromium to eventually follow!

annevk mentioned this issue May 26, 2024

Modernize Blob constructor a tiny bit w3c/FileAPI#125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web platform tests disagree on how to handle blob content-type validation #189

Web platform tests disagree on how to handle blob content-type validation #189

wisniewskit commented Apr 26, 2024

annevk commented Apr 26, 2024

wisniewskit commented Apr 26, 2024

annevk commented Apr 26, 2024

wisniewskit commented Apr 28, 2024 •

edited

annevk commented Apr 29, 2024

wisniewskit commented May 2, 2024

wisniewskit commented May 2, 2024

annevk commented May 2, 2024

wisniewskit commented May 2, 2024 •

edited

domenic commented May 7, 2024 •

edited

annevk commented May 7, 2024

domenic commented May 7, 2024

wisniewskit commented May 8, 2024

domenic commented May 8, 2024

Web platform tests disagree on how to handle blob content-type validation #189

Web platform tests disagree on how to handle blob content-type validation #189

Comments

wisniewskit commented Apr 26, 2024

What is the issue with the MIME Sniffing Standard?

annevk commented Apr 26, 2024

wisniewskit commented Apr 26, 2024

annevk commented Apr 26, 2024

wisniewskit commented Apr 28, 2024 • edited

annevk commented Apr 29, 2024

wisniewskit commented May 2, 2024

wisniewskit commented May 2, 2024

annevk commented May 2, 2024

wisniewskit commented May 2, 2024 • edited

domenic commented May 7, 2024 • edited

annevk commented May 7, 2024

domenic commented May 7, 2024

wisniewskit commented May 8, 2024

domenic commented May 8, 2024

wisniewskit commented Apr 28, 2024 •

edited

wisniewskit commented May 2, 2024 •

edited

domenic commented May 7, 2024 •

edited