Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: decode url for middleware #4728

Merged
merged 14 commits into from Aug 27, 2021
Merged

Conversation

hannoeru
Copy link
Sponsor Contributor

@hannoeru hannoeru commented Aug 25, 2021

Description

#4600 deletes decode URL middleware because decode will be processed by sirv, but we still need to decode URL for
transform middleware.

Additional context


What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

Before submitting the PR, please make sure you do the following

  • Read the Contributing Guidelines.
  • Read the Pull Request Guidelines and follow the Commit Convention.
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. fixes #123).
  • Ideally, include relevant tests that fail without this PR but pass with it.

@hannoeru hannoeru changed the title fix: decode url for transform middleware fix: decode url for middleware Aug 25, 2021
packages/vite/src/node/server/index.ts Outdated Show resolved Hide resolved
@Shinigami92 Shinigami92 added the p3-minor-bug An edge case that only affects very specific usage (priority) label Aug 25, 2021
hannoeru and others added 3 commits August 25, 2021 23:29
Co-authored-by: CHOYSEN <choysen@qq.com>
Co-authored-by: Shinigami <chrissi92@hotmail.de>
Co-authored-by: Jeff Yang <32727188+ydcjeff@users.noreply.github.com>
@hannoeru
Copy link
Sponsor Contributor Author

hannoeru commented Aug 25, 2021

This middleware is copied from the old commits, I should check it first 😓

@patak-dev
Copy link
Member

pinging @benmccann so it is aware of this

@hannoeru do you think we could add a test case to avoid future regressions?

@benmccann
Copy link
Collaborator

benmccann commented Aug 25, 2021

This isn't the right fix. URLs should not be globally decoded in middleware. The decodeURI middleware will cause bugs for all middlewares that come after it for a couple reasons:

  • Almost all middlewares expect the req.url has not been decoded. Express, Connect, Polka, etc. do not decode the URL by default, so by giving the URL in a different format, you break some of these middlewares
  • You irreversibly destroy information that was present in the original URL. For an example of this see https://answers.netlify.com/t/bug-fix-url-encoding-preserved-in-function-event/27080 where Netlify was at one point decoding URLs by default and then realized that was a broken behavior and stopped doing it

Rather than decode the request URL globally in middleware, each consumer should decode as necessary if required for their use case.

I'm fairly confident that sirv@1.0.14 does decode URLs correctly. We use it in SvelteKit's adapter-node and it's working fine. But if there's some issue with it, that should be reported and fixed in sirv and not changed here. If you really really want to work around it you should do something like stick the original URL property on the request when decoding and then add another middleware that restores the original URL as received from the browser after the buggy middleware has processed the request rather than leaving the URL altered for all subsequent middlewares

I don't see any description of what bug is being hit, how to reproduce it, etc. I would guess that if anything needs to be changed it's something in Vite itself that should decode the URL just in that middleware and not globally

@hannoeru
Copy link
Sponsor Contributor Author

hannoeru commented Aug 25, 2021

Currently, if you serving files that using Unicode filename it will fail in transform middleware because it can't resolve to a correct path in the filesystem.

I just checked express's source code, decode URL inside middleware is the normal way to do it so I will change it.

About sirv, I also checked its source code, it uses @polka/url to parse URL but didn't decode it, I think in SvelteKit will work is because it uses polka and you made a pull request that will decode URL for all request https://github.com/lukeed/polka/pull/172/files, I didn't test it but if I'm wrong can you point which part of the source code that decoded URL?

@benmccann
Copy link
Collaborator

you made a pull request that will decode URL for all request

That's exactly the opposite of what that pull request does 😄 The PR description is: "do not decode req.path by default". It stops polka from decoding the URL because it was a bug that it was being decoded. Now it's only params that are decoded

sirv does decode the path. It does it here: https://github.com/lukeed/sirv/blob/83ee458eac97258952e375766834800fba892408/packages/sirv/index.js#L163

The second parameter to parse is whether to decode and is set to true

@hannoeru
Copy link
Sponsor Contributor Author

The second parameter to parse is whether to decode and is set to true

But parse only take one argument https://github.com/lukeed/polka/blob/master/packages/url/index.js, why 🧐

@hannoeru
Copy link
Sponsor Contributor Author

Got it, thank you for the clarification, I will make some changes.

@hannoeru hannoeru marked this pull request as draft August 25, 2021 20:50
@hannoeru
Copy link
Sponsor Contributor Author

@benmccann Just add some test than I found that sirv didn't decode URL if you pass the same request twice, it ignores decode URL because it set req._decoded but didn't replace req.url which is correct, but next time it will think req.url is decoded.

https://github.com/lukeed/polka/blob/e0a65c781b763f7421f1a64d4166b3911f4d20e9/packages/url/index.js#L40-L41

@hannoeru hannoeru marked this pull request as ready for review August 25, 2021 22:16
@benmccann
Copy link
Collaborator

Ok. I would say let's see if we can get sirv fixed rather than doing the workaround here. I filed an issue for it here: lukeed/sirv#115. @lukeed was quite fast in getting a fix in and released the last time I hit an issue

Thanks for tracking down the issue @hannoeru !

@lukeed
Copy link

lukeed commented Aug 26, 2021

just published sirv@1.0.15 – thanks for the report 👍

@hannoeru
Copy link
Sponsor Contributor Author

hannoeru commented Aug 26, 2021

@lukeed that didn't work because connect will replace req._parsedUrl every time when you call next() inside middleware if it's different from the raw URL or is not a URL Instance.

The first time it will set req._decoded and _parsedUrl, but req._parsedUrl will be replaced by connect, so the second time when we call it, it will check req._decoded that get req._parsedUrl which is not decoded.

Maybe we can use another variable like req._decodedUrl to avoid conflict with connect.

@lukeed
Copy link

lukeed commented Aug 26, 2021

That's...odd. Performs great I'm sure 😮‍💨

I left a TODO for myself to improve this later, but guess I'll do it now.

@lukeed
Copy link

lukeed commented Aug 26, 2021

Then you'll still need your cleanup after all.

Decoding is expensive and needs to be prevented as much as possible. There's no other sane/reliable way to cache this value per request lifecycle, and arguably passing the same request thru the same middleware but with different urls is out of the ordinary.

I don't know how connect works, but a core thing Polka/Express does is make sure that any req.url mutations are reverted when the request leaves a subrouter/middleware group.

The only thing I can do on my site is remove decoding from @polka/url entirely & have sirv do this:

let pathname = decodeURIComponent(req.path || parser(req));

I'd have to check dependents and see how inconvenient this may be for them.

@Shinigami92 Shinigami92 added the bug: upstream Bug in a dependency of Vite label Aug 26, 2021
@Shinigami92
Copy link
Member

Wow, just one sleep and these comments going wild (in a positive way 😃)
This is Open Source how I 💚 it 🚀
Shoutout to all participants 💪

@hannoeru
Copy link
Sponsor Contributor Author

I just changed to cleanup req._decoded before entering sirv in this PR, if anyone has a better idea we can open another PR in the feature.

Shinigami92
Shinigami92 previously approved these changes Aug 26, 2021
@lukeed
Copy link

lukeed commented Aug 26, 2021

Just to clarify – this isn't an upstream bug. This has to do with how vite is manually juggling & manipulating req.url, which is affecting an upstream's cache.

@patak-dev patak-dev removed the bug: upstream Bug in a dependency of Vite label Aug 26, 2021
benmccann
benmccann previously approved these changes Aug 26, 2021
patak-dev
patak-dev previously approved these changes Aug 26, 2021
@lukeed
Copy link

lukeed commented Aug 27, 2021

Sorry, I'm back 🙃 Turns out, no one else cared about the built-in decoding in the URL parser, so I removed it & implemented it manually in sirv. I also removed any reliance on previous values, so the req._decoded flagging is gone now.

These changes are now available as sirv@1.0.17, if you want to update this PR (again 🙈) @hannoeru. Thanks all~!

@patak-dev
Copy link
Member

Thanks for implementing this tweak in sirv so fast @lukeed! @hannoeru I think it is a good idea to use sirv@1.0.17 as proposed as it will be easier to maintain, thanks for pushing for this fix forward and for the patience during all these changes

@hannoeru hannoeru dismissed stale reviews from patak-dev and benmccann via 49f1c52 August 27, 2021 09:25
@patak-dev patak-dev merged commit 824d042 into vitejs:main Aug 27, 2021
aleclarson pushed a commit to aleclarson/vite that referenced this pull request Nov 8, 2021
Co-authored-by: CHOYSEN <choysen@qq.com>
Co-authored-by: Shinigami <chrissi92@hotmail.de>
Co-authored-by: Jeff Yang <32727188+ydcjeff@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p3-minor-bug An edge case that only affects very specific usage (priority)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants