-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parameters decoding #253
Fix parameters decoding #253
Conversation
Given https://owasp.org/www-community/Double_Encoding, we can't regress on this topic. |
In that case, the path-to-regexp library has this problem 'use strict'
const { equal } = require('assert')
const { match } = require('path-to-regexp')
function doubleEncode (str) {
return encodeURIComponent(encodeURIComponent(str))
}
function singleEncode (str) {
return encodeURIComponent(str)
}
const handler = match('/🍌/:id', { decode: decodeURIComponent });
equal(handler(decodeURI(`/%F0%9F%8D%8C/${singleEncode('#')}`)).params.id, '#'); // equal
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, '#'); // equal
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, singleEncode('#')); // fails |
Should we just document the current behavior, or maybe adopt what URLPattern does? |
I'll think about it. URLPattern doesn't decode a URL at all. It encodes route and request URLs and then matches two encoded URLs. It uses URL class for it. URL class encodes a URL not just with encodeURI or encodeURIComponent function, it implements the basic URL parser (https://url.spec.whatwg.org/#concept-basic-url-parser) which has some conflicts with rules that find-my-way has. For example, we can have ':' and '#' in the find-my-way routes, but in the URL class, they are used as delimiters. This means we can't just use the URL class for our purposes as URLPattern does. |
c9fb58c
to
66ea94b
Compare
There is another good solution to this issue. We can compare encoded URLs instead of comparing decoded URLs. I added a new findMyWay.on('GET', '/🍌', handler)
findMyWay.find('GET', '/🍌') // this URL will match the route
findMyWay.find('GET', '/%F0%9F%8D%8C') // this URL won't findMyWay.on('GET', '/🍌', { encode: true }, handler)
findMyWay.find('GET', '/🍌') // this URL will NOT match the route
findMyWay.find('GET', '/%F0%9F%8D%8C') // this URL will match the route path-to-regexp resolves this issue in the same way. There is a !!! This is a semver-major change We don't need to rush into this. It would be better to have as many reviews as possible before it is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lgtm this but the question for us is if we skip this in Fastify v4 or not... and how we expose it there.
I think you have added a new option, could you document it too? |
I was trying to figure out how Express provides this path-to-regexp option. It looks like it just ignore it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we save the original path for display in prettyPrint
method?
This PR actually fixes a regression introduced in #207. Without this the following test would not pass: diff --git a/test/find.test.js b/test/find.test.js
index 2de277f..9802494 100644
--- a/test/find.test.js
+++ b/test/find.test.js
@@ -15,3 +15,23 @@ test('find calls can pass no constraints', t => {
t.ok(findMyWay.find('GET', '/a/b'))
t.notOk(findMyWay.find('GET', '/a/b/c'))
})
+
+test('find with encoded routes', t => {
+ t.plan(2)
+ const findMyWay = FindMyWay()
+
+ findMyWay.on('GET', '/a%20.md', () => {
+ t.pass('handler called')
+ })
+
+ t.ok(findMyWay.find('GET', '/a%20.md'))
+
+ findMyWay.lookup({
+ method: 'GET',
+ url: '/a%20.md'
+ }, {
+ end () {
+ t.fail('end should not be called')
+ }
+ })
+}) Note that the above test does not fail on v4. This is needed to update fastify-static to Fastify v4, so I'm inclined to release it as semver-minor. @Eomm wdyt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Eomm wdyt?
@mcollina I think this PR reverts the PR you pointed out, but it removes a feature.
The #207 focuses on decoding the path parameters
This PR, instead:
- provides an
encode
option that acts during the path parameter digesting into the radix-tree - restoring the old logic and the old the lib doesn't handle url-encoded URI properly #206 issue. I recall that the main issue was the
fast-decode-uri-component
usage. I don't remember further details.
Regarding this PR, in fact, this test does not pass:
test('find with encoded routes', t => {
t.plan(2)
const findMyWay = FindMyWay()
findMyWay.on('GET', '/~smith/home.html', () => {
t.pass('handler called')
})
t.ok(findMyWay.find('GET', '/~smith/home.html'))
t.ok(findMyWay.find('GET', '/%7Esmith/home.html'))
})
Instead, regarding the failing test you added, I think the correct endpoint to call would be:
t.ok(findMyWay.find('GET', '/a%2520.md'))
// instead of
t.ok(findMyWay.find('GET', '/a%20.md'))
because it would be the client that, finding a special character, should encode it.
At this point, I think we can't support both options
Reverting seems somewhat impossible. |
If you can point me to a problem, I can help. |
On: t.ok(findMyWay.find('GET', '/a%2520.md'))
// instead of
t.ok(findMyWay.find('GET', '/a%20.md')) Why? The file we want is |
I need #253 passing. |
@mcollina It's a breaking change. You can't anymore match not decoded and one time decoded URL, setting only one not decoded URL. You need to set them both. |
You will match exactly that route, that you set up. |
I've resolved the problem with fastify-static by removing the encodeURI: https://github.com/fastify/fastify-static/blob/dee5d8014c0831dc4721f3b09062a75d7348eead/index.js#L341 |
The
I'll try to clarify my point: starting from the spec: https://www.rfc-editor.org/rfc/rfc3986#section-2.4
for this statement, I think that rollbacking the #207 PR is incorrect. I think this module must support the following test without any additional options set by the developer:
So, I think this PR can add the |
@Eomm Yes, some of the already created routes will not work. That is why this is a breaking change. But the main difference is that with this PR, you can create and match any route you want, and in the current state, you can't. For example, if the route contains an encoded part in a static and parametric part, you have no way to implement this. test('find with encoded routes', t => {
t.plan(2)
const findMyWay = FindMyWay()
findMyWay.on('GET', '/🍌smith/:param', () => {
t.pass('handler called')
})
t.ok(findMyWay.find('GET', '/🍌smith/%2520.m'))
t.ok(findMyWay.find('GET', '/%F0%9F%8D%8Csmith/%2520.m'))
}) |
path-to-regexp and URLPattern also don't decode a url with the decodeURI function. path-to-regexp match exactly the same path as the path you registered and then can decode params with decodeURIComponent and URLPattern creates an instance of URL class which encodes a URL (non exactly with the encodeURI function, but in a similar way). |
If you want, we can set both raw and encoded URLs by default. I don't think it's a great idea, but it will make the number of routes that will fail after updating less. |
7bf0b8d
to
5ed535f
Compare
I'm trying to implement the second version of this fix. I want to make it work as URLPattern. The problem is that URLPattern also doesn't handle all cases that we want. Examples that don't work: const pattern = new URLPattern('https://host.com/[...]/a .html')
pattern.exec('https://host.com/%5B...%5D/a .html') const pattern = new URLPattern('https://host.com/%20.html')
pattern.exec('https://host.com/%2520.html') // double encoding |
I don't want to leave this issue unresolved, because the current implementation has a significant bug and is not very fast. |
I will explain the problem one more time. When we decode a path with the decodeURI function, it decodes some chars and shrinks the path, so we can't tell where the param/wildcard part starts anymore. |
Open an issue on URLPattern. |
I'm not really convinced of this and @Eomm is not in agreement. (I'm currently focusing on shipping Fastify v4). |
The URLPattern doesn't have a bug. This is how it is intended. They don't decode a URL with the decodeURI function. Instead they "canonicalize" URL in the same way as a URL class constructor does. The main point is there is no specification on how dynamic pathname should be decoded. @Eomm If you know any spec or any real example that decodes a URL with params in the pathname, please share it with me. |
@mcollina Shouldn't we fix it then (maybe with another fix) and add a fix to the fastify v4? |
@ivan-tymoshenko I don't think there is a problem with the solution I suggested, there is a problem with the implementation of find-my-way. I checked the case from #234 on prostojs/router (an alternative to find-my-way) and it processes it just as you and I would expect. I guess this hardly solves your problem as you are probably heavily dependent on fastify and/or find-my-way. But it is what it is... |
@mav-rik Thanks for your reply. I will check how the prostojs/router works. Don't think about find-my-way implementation. Here is a problem described in one sentence. |
@ivan-tymoshenko so we can't tell where the param/wildcard part starts anymore - why so? This is how find-my-way implemented right? Could you give a test case from the real scenario? |
@mav-rik I checked how prostojs/router works. It double-decodes params. It's the easiest solution. Decode a full URL first and then decode a param. Unfortunately this is a vulnerability. const { ProstoRouter } = require('@prostojs/router')
const router = new ProstoRouter()
router.on('GET', '/:param', () => 'ok')
router.lookup('GET', '/param%2523').ctx.params // returns { param: 'param#' } |
@mav-rik Here is a real example. Pattern route: Expected param: |
Ok I guess I got what you mean. Oh boy, thanks for this finding! |
@mav-rik I don't want to discuss @prostojs/router here a lot. But as I understand you just removed URL components decoding. You still should decode URL components, but don't decode them twice or more. And even if you decide to not decode them at all you should return them raw and you don't do that either. Here as a result should be '#' and not "%23" |
@ivan-tymoshenko I agree that this is a wrong place do discuss prostojs/router. I also agree that my first fix wasn't very well balanced. This was challenging, but I finally achieved the goal: now I decode the URL but only once. Again, thanks for this findings. If you find more problems please open issue in prostojs/router directly and I'll be happy to work on it. |
194ce16
to
f275a1f
Compare
I discussed this case with the path-to-regex maintainer. Here is his conclusion about it. |
Closed in favour of #282 |
Fixed #234
The idea of this solution is to parse a param, not from bare URL but from decoded by decodeURI function and then parse decode only (# $ & + , / : ; = ? @).
!!! This solution has a problem. If we receive one of these (# $ & + , / : ; = ? @) symbols encoded twice, we will also decoded it twice. #207 (comment)
@mcollina @Eomm any thoughts on how to avoid this?