Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Miniflare malforms utf8 characters #326

Closed
Skye-31 opened this issue Aug 2, 2022 · 5 comments
Closed

[bug] Miniflare malforms utf8 characters #326

Skye-31 opened this issue Aug 2, 2022 · 5 comments
Labels
behaviour mismatch Different behaviour to Workers runtime
Milestone

Comments

@Skye-31
Copy link
Contributor

Skye-31 commented Aug 2, 2022

Using the following code, and running miniflare --modules worker.js results in the file being malformed in a way it shouldn't be, as shown with the image.
image

const content = `<!DOCTYPE html>
<html>
    <head>
        <title>ń</title>
    </head>
    <body>
        <p>ń</p>
    </body>
</html>`;

export default {
  fetch(req) {
    return new Response(content, {
      status: 200,
      headers: {
        'content-type': 'text/html'
      }
    });
  }
}
@CraigglesO
Copy link
Contributor

CraigglesO commented Aug 2, 2022

Hey! I did some digging on this because it looked interesting.

Here is the response from miniflare if you pull up developer tools -> Network:

Screen Shot 2022-08-02 at 17 18 56

Notice the response was never impacted. "ń" is still there.

Running a check on the character in nodejs terminal:

"ń".charCodeAt(0)
// 324

https://unicode-table.com/en/0144/
This matches unicode UTF-16BE.

With some digging the answer is that HTML by default treats everything as utf-8!!! Wow, how old school html is.

You have two options:

  1. Convert all utf-16 with BOM (Byte order mark) like so (not recommended):
const content = `<!DOCTYPE html>
<html>
    <head>
        <title>ń</title>
    </head>
    <body>
        <p>ń</p>
    </body>
</html>`;

export default {
  fetch(req) {
    const BOM = [0xEF, 0xBB, 0xBF];
    // convert content to UTF-8 with BOM
    const contentUTF8 = new TextEncoder().encode(content);
    const contentWithBOM = new Uint8Array(BOM.length + contentUTF8.length);
    contentWithBOM.set(BOM);
    contentWithBOM.set(contentUTF8, BOM.length);

    return new Response(contentWithBOM, {
      status: 200,
      headers: {
        'content-type': 'text/html'
      }
    });
  }
}
  1. Use a more modern framework that does everything in JS to avoid these kinds of problems.

@Skye-31
Copy link
Contributor Author

Skye-31 commented Aug 2, 2022

Just passing on this issue from @Hexstream's referenced issue which I was debugging.

It's a pages site using wrangler. When running wrangler pages dev this bug occurs, resulting in assets linked on that page which contain the character being a 404, as it is looking for a different character. However serving the assets in production is fine, leading to us investigating it on miniflare's end, and locating the issue

We initially thought it was a wrangler issue, however then narrowed it down to miniflare somewhere between the wrangler server returning the response and it reaching the browser. As this issue occurs in Miniflare but not production, I have a feeling there's something that can be done here for it to be resolved.

@CraigglesO
Copy link
Contributor

I understand the issue now.

To clarify for this thread:

When a browser accepts a request, it first converts anything not within utf-8 like so:
My request: http://localhost:8788/ń
What the server (Miniflare) sees: http://localhost:8788/%C5%84

This is called percent encoding.

However, a quick look at Miniflare, it indeed does not decode the URI prior to checking against the file location, but keeps the percent encoding. This is also a problem with other special characters.

As stated before though, Miniflare is returning the data correctly, it's up to the user to ensure HTML with UTF-16 is parsed correctly.

@Hexstream
Copy link

Hexstream commented Aug 3, 2022

[edit: Mostly responding to @Skye-31, sorry for late reply...]

My issue has everything to do with URLs and nothing to do with HTML.

Basically, entering these URLs directly in the address bar,

https://www.example.com/%C5%84/ (in production) works, while

http://localhost:8788/%C5%84/ (with wrangler pages dev) does not.

(You can also enter the URLs in the address bar with ń directly, but this is a user convenience feature and %C5%84 will actually be sent anyway.)

I seem to understand that miniflare is not mapping from percent-encoded URLs to files correctly.

[edit: #327 indeed looks like a likely fix!]

@mrbbot
Copy link
Contributor

mrbbot commented Aug 11, 2022

Hey! 👋 Apologies for the delayed response. I've recently returned from a long holiday and am just starting to catch up on issues now.

I think this issue is caused by Miniflare not populating __STATIC_CONTENT_MANIFEST. Miniflare does this to disable caching of Workers Sites files, so the latest file is always served. This has caused an another issue before though.

More specifically, @cloudflare/kv-asset-handler will check if the URI decoded path is in the manifest. If it is, it will URI decode the path before looking up the asset in KV.

As a temporary solution, it looks like there's a pathIsEncoded option to always enable URI decoding. I'd enable this when running in Miniflare with something like this (assuming you're using a modules format worker):

import { getAssetFromKV, NotFoundError } from "@cloudflare/kv-asset-handler";
import manifestJSON from "__STATIC_CONTENT_MANIFEST";
const assetManifest = JSON.parse(manifestJSON);

export default {
  async fetch(request, env, ctx) {
    try {
      return await getAssetFromKV(
        { request, waitUntil: ctx.waitUntil.bind(ctx) },
        {
          ASSET_NAMESPACE: env.__STATIC_CONTENT,
          ASSET_MANIFEST: assetManifest,
          pathIsEncoded: globalThis.MINIFLARE, // ⬅️ This is the important bit
        }
      );
    } catch (e) {
      if (e instanceof NotFoundError) {
        return new Response(null, { status: 404 });
      }
      throw e;
    }
  },
};

@mrbbot mrbbot added the behaviour mismatch Different behaviour to Workers runtime label Aug 11, 2022
@mrbbot mrbbot added this to the 2.7.0 milestone Aug 11, 2022
@mrbbot mrbbot modified the milestones: 2.7.0, 2.8.0 Aug 19, 2022
@mrbbot mrbbot closed this as completed in 9ce44ba Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
behaviour mismatch Different behaviour to Workers runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants