Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parameters decoding #253

Closed

Conversation

ivan-tymoshenko
Copy link
Collaborator

Fixed #234

The idea of this solution is to parse a param, not from bare URL but from decoded by decodeURI function and then parse decode only (# $ & + , / : ; = ? @).

!!! This solution has a problem. If we receive one of these (# $ & + , / : ; = ? @) symbols encoded twice, we will also decoded it twice. #207 (comment)

@mcollina @Eomm any thoughts on how to avoid this?

@mcollina
Copy link
Collaborator

Given https://owasp.org/www-community/Double_Encoding, we can't regress on this topic.

@ivan-tymoshenko
Copy link
Collaborator Author

In that case, the path-to-regexp library has this problem

'use strict'

const { equal } = require('assert')
const { match } = require('path-to-regexp')

function doubleEncode (str) {
  return encodeURIComponent(encodeURIComponent(str))
}

function singleEncode (str) {
  return encodeURIComponent(str)
}

const handler = match('/🍌/:id', { decode: decodeURIComponent });

equal(handler(decodeURI(`/%F0%9F%8D%8C/${singleEncode('#')}`)).params.id, '#'); // equal
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, '#'); // equal
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, singleEncode('#')); // fails

@mcollina
Copy link
Collaborator

Should we just document the current behavior, or maybe adopt what URLPattern does?

@ivan-tymoshenko
Copy link
Collaborator Author

I'll think about it. URLPattern doesn't decode a URL at all. It encodes route and request URLs and then matches two encoded URLs. It uses URL class for it. URL class encodes a URL not just with encodeURI or encodeURIComponent function, it implements the basic URL parser (https://url.spec.whatwg.org/#concept-basic-url-parser) which has some conflicts with rules that find-my-way has. For example, we can have ':' and '#' in the find-my-way routes, but in the URL class, they are used as delimiters. This means we can't just use the URL class for our purposes as URLPattern does.

@ivan-tymoshenko ivan-tymoshenko force-pushed the fix-parameters-decoding branch 4 times, most recently from c9fb58c to 66ea94b Compare April 23, 2022 15:18
@ivan-tymoshenko
Copy link
Collaborator Author

There is another good solution to this issue. We can compare encoded URLs instead of comparing decoded URLs. I added a new encode option to the on method. If you set the option, the URL will be encoded before setting.

findMyWay.on('GET', '/🍌', handler)

findMyWay.find('GET', '/🍌') // this URL will match the route
findMyWay.find('GET', '/%F0%9F%8D%8C') // this URL won't
findMyWay.on('GET', '/🍌', { encode: true }, handler)

findMyWay.find('GET', '/🍌') // this URL will NOT match the route
findMyWay.find('GET', '/%F0%9F%8D%8C') // this URL will match the route

path-to-regexp resolves this issue in the same way. There is a encode option. The only difference is that you should pass a function that will encode a path. I don't see the case when it will be useful to pass something except the encodeURI function.
https://github.com/pillarjs/path-to-regexp#process-pathname

!!! This is a semver-major change

We don't need to rush into this. It would be better to have as many reviews as possible before it is merged.

@ivan-tymoshenko
Copy link
Collaborator Author

Main branch

main

Current branch

current

@ivan-tymoshenko ivan-tymoshenko marked this pull request as ready for review April 23, 2022 17:03
Copy link
Collaborator

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lgtm this but the question for us is if we skip this in Fastify v4 or not... and how we expose it there.

test/issue-206.test.js Outdated Show resolved Hide resolved
@mcollina
Copy link
Collaborator

I think you have added a new option, could you document it too?

@ivan-tymoshenko
Copy link
Collaborator Author

I was trying to figure out how Express provides this path-to-regexp option. It looks like it just ignore it.

Copy link
Collaborator

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@climba03003 climba03003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save the original path for display in prettyPrint method?

@mcollina
Copy link
Collaborator

mcollina commented May 5, 2022

This PR actually fixes a regression introduced in #207. Without this the following test would not pass:

diff --git a/test/find.test.js b/test/find.test.js
index 2de277f..9802494 100644
--- a/test/find.test.js
+++ b/test/find.test.js
@@ -15,3 +15,23 @@ test('find calls can pass no constraints', t => {
   t.ok(findMyWay.find('GET', '/a/b'))
   t.notOk(findMyWay.find('GET', '/a/b/c'))
 })
+
+test('find with encoded routes', t => {
+  t.plan(2)
+  const findMyWay = FindMyWay()
+
+  findMyWay.on('GET', '/a%20.md', () => {
+    t.pass('handler called')
+  })
+
+  t.ok(findMyWay.find('GET', '/a%20.md'))
+
+  findMyWay.lookup({
+    method: 'GET',
+    url: '/a%20.md'
+  }, {
+    end () {
+      t.fail('end should not be called')
+    }
+  })
+})

Note that the above test does not fail on v4. This is needed to update fastify-static to Fastify v4, so I'm inclined to release it as semver-minor.

@Eomm wdyt?

Copy link
Contributor

@Eomm Eomm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eomm wdyt?

@mcollina I think this PR reverts the PR you pointed out, but it removes a feature.

The #207 focuses on decoding the path parameters

This PR, instead:

  • provides an encode option that acts during the path parameter digesting into the radix-tree
  • restoring the old logic and the old the lib doesn't handle url-encoded URI properly #206 issue. I recall that the main issue was the fast-decode-uri-component usage. I don't remember further details.

Regarding this PR, in fact, this test does not pass:

test('find with encoded routes', t => {
  t.plan(2)
  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/~smith/home.html', () => {
    t.pass('handler called')
  })

  t.ok(findMyWay.find('GET', '/~smith/home.html'))
  t.ok(findMyWay.find('GET', '/%7Esmith/home.html'))
})

Instead, regarding the failing test you added, I think the correct endpoint to call would be:

t.ok(findMyWay.find('GET', '/a%2520.md'))
// instead of
t.ok(findMyWay.find('GET', '/a%20.md'))

because it would be the client that, finding a special character, should encode it.

At this point, I think we can't support both options

index.js Show resolved Hide resolved
index.js Show resolved Hide resolved
@mcollina
Copy link
Collaborator

mcollina commented May 5, 2022

Do you recommend we revert #207 instead? I really need the above bug fixed.

Reverting seems somewhat impossible.

@ivan-tymoshenko
Copy link
Collaborator Author

If you can point me to a problem, I can help.

@mcollina
Copy link
Collaborator

mcollina commented May 5, 2022

On:

t.ok(findMyWay.find('GET', '/a%2520.md'))
// instead of
t.ok(findMyWay.find('GET', '/a%20.md'))

Why? The file we want is /a .md, I don't think we should encode that.

@mcollina
Copy link
Collaborator

mcollina commented May 5, 2022

If you can point me to a problem, I can help.

I need #253 passing.

@mcollina
Copy link
Collaborator

mcollina commented May 5, 2022

@mcollina I think this PR reverts the PR you pointed out, but it removes a feature.

@Eomm what feature does it reverts? All tests seems passing.

@ivan-tymoshenko
Copy link
Collaborator Author

@mcollina It's a breaking change. You can't anymore match not decoded and one time decoded URL, setting only one not decoded URL. You need to set them both.

@ivan-tymoshenko
Copy link
Collaborator Author

You will match exactly that route, that you set up. encode options just helps you to encode your URL, so that you don't need to do it manually.

@mcollina
Copy link
Collaborator

mcollina commented May 5, 2022

I've resolved the problem with fastify-static by removing the encodeURI: https://github.com/fastify/fastify-static/blob/dee5d8014c0831dc4721f3b09062a75d7348eead/index.js#L341

@Eomm
Copy link
Contributor

Eomm commented May 5, 2022

what feature does it reverts? All tests seems passing.

The issue-206.test.js file has changes that need to be added to pass:

  • it adds some routes that would be 404 otherwise: because the decoding is not executed
  • it adds the encode option for the same reason

I'll try to clarify my point: starting from the spec: https://www.rfc-editor.org/rfc/rfc3986#section-2.4

Because the percent (“%”) character serves as the indicator for
percent-encoded octets, it must be percent-encoded as “%25" for that
octet to be used as data within a URI

for this statement, I think that rollbacking the #207 PR is incorrect.

I think this module must support the following test without any additional options set by the developer:

test('find with encoded routes', t => {
  t.plan(2)
  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/~smith/home.html', () => {
    t.pass('handler called')
  })

  t.ok(findMyWay.find('GET', '/~smith/home.html'))
  t.ok(findMyWay.find('GET', '/%7Esmith/home.html'))
})

So, I think this PR can add the encode option, but it needs to restore the old path parameter logic

@ivan-tymoshenko
Copy link
Collaborator Author

@Eomm Yes, some of the already created routes will not work. That is why this is a breaking change. But the main difference is that with this PR, you can create and match any route you want, and in the current state, you can't. For example, if the route contains an encoded part in a static and parametric part, you have no way to implement this.

test('find with encoded routes', t => {
  t.plan(2)
  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/🍌smith/:param', () => {
    t.pass('handler called')
  })

  t.ok(findMyWay.find('GET', '/🍌smith/%2520.m'))
  t.ok(findMyWay.find('GET', '/%F0%9F%8D%8Csmith/%2520.m'))
})

@ivan-tymoshenko
Copy link
Collaborator Author

path-to-regexp and URLPattern also don't decode a url with the decodeURI function. path-to-regexp match exactly the same path as the path you registered and then can decode params with decodeURIComponent and URLPattern creates an instance of URL class which encodes a URL (non exactly with the encodeURI function, but in a similar way).

@ivan-tymoshenko
Copy link
Collaborator Author

If you want, we can set both raw and encoded URLs by default. I don't think it's a great idea, but it will make the number of routes that will fail after updating less.

@ivan-tymoshenko
Copy link
Collaborator Author

I'm trying to implement the second version of this fix. I want to make it work as URLPattern. The problem is that URLPattern also doesn't handle all cases that we want.

Examples that don't work:

const pattern = new URLPattern('https://host.com/[...]/a .html')
pattern.exec('https://host.com/%5B...%5D/a .html')
const pattern = new URLPattern('https://host.com/%20.html')
pattern.exec('https://host.com/%2520.html') // double encoding

@ivan-tymoshenko
Copy link
Collaborator Author

I don't want to leave this issue unresolved, because the current implementation has a significant bug and is not very fast.

@ivan-tymoshenko
Copy link
Collaborator Author

I will explain the problem one more time. When we decode a path with the decodeURI function, it decodes some chars and shrinks the path, so we can't tell where the param/wildcard part starts anymore.

@mcollina
Copy link
Collaborator

The problem is that URLPattern also doesn't handle all cases that we want.

Open an issue on URLPattern.

@mcollina
Copy link
Collaborator

I'm not really convinced of this and @Eomm is not in agreement.

(I'm currently focusing on shipping Fastify v4).

@ivan-tymoshenko
Copy link
Collaborator Author

The URLPattern doesn't have a bug. This is how it is intended. They don't decode a URL with the decodeURI function. Instead they "canonicalize" URL in the same way as a URL class constructor does.
https://wicg.github.io/urlpattern/#canonicalize-a-pathname
https://url.spec.whatwg.org/#concept-basic-url-parser

The main point is there is no specification on how dynamic pathname should be decoded. @Eomm If you know any spec or any real example that decodes a URL with params in the pathname, please share it with me.

@ivan-tymoshenko
Copy link
Collaborator Author

ivan-tymoshenko commented May 10, 2022

@mcollina Shouldn't we fix it then (maybe with another fix) and add a fix to the fastify v4?

@ivan-tymoshenko
Copy link
Collaborator Author

@mav-rik Hi, you made a great job discovering problem #206. I want to ask you if you have time to help here. There is a problem with the solution you suggested.

@mav-rik
Copy link

mav-rik commented May 10, 2022

@mav-rik Hi, you made a great job discovering problem #206. I want to ask you if you have time to help here. There is a problem with the solution you suggested.

@ivan-tymoshenko I don't think there is a problem with the solution I suggested, there is a problem with the implementation of find-my-way.
I agree with #234 that the test case with /%F0%9F%8D%8C-foo%23bar must be resolved to '🍌-foo#bar'.
I don't really want to deep dive into find-my-way implementation. Honestly, I find it a bit overwhelmed with unnecessary logic.

I checked the case from #234 on prostojs/router (an alternative to find-my-way) and it processes it just as you and I would expect. I guess this hardly solves your problem as you are probably heavily dependent on fastify and/or find-my-way. But it is what it is...

@ivan-tymoshenko
Copy link
Collaborator Author

I will explain the problem one more time. When we decode a path with the decodeURI function, it decodes some chars and shrinks the path, so we can't tell where the param/wildcard part starts anymore.

@mav-rik Thanks for your reply. I will check how the prostojs/router works. Don't think about find-my-way implementation. Here is a problem described in one sentence.

@mav-rik
Copy link

mav-rik commented May 10, 2022

I will explain the problem one more time. When we decode a path with the decodeURI function, it decodes some chars and shrinks the path, so we can't tell where the param/wildcard part starts anymore.

@ivan-tymoshenko so we can't tell where the param/wildcard part starts anymore - why so? This is how find-my-way implemented right? Could you give a test case from the real scenario?

@ivan-tymoshenko
Copy link
Collaborator Author

@mav-rik I checked how prostojs/router works. It double-decodes params. It's the easiest solution. Decode a full URL first and then decode a param. Unfortunately this is a vulnerability.
https://owasp.org/www-community/Double_Encoding

const { ProstoRouter } = require('@prostojs/router')

const router = new ProstoRouter()

router.on('GET', '/:param', () => 'ok')
router.lookup('GET', '/param%2523').ctx.params // returns { param: 'param#' }

@ivan-tymoshenko
Copy link
Collaborator Author

@mav-rik Here is a real example.

Pattern route: /~:param
Input route: /%7E%2523

Expected param: %23

@mav-rik
Copy link

mav-rik commented May 10, 2022

@mav-rik I checked how prostojs/router works. It double-decodes params. It's the easiest solution. Decode a full URL first and then decode a param. Unfortunately this is a vulnerability. https://owasp.org/www-community/Double_Encoding

const { ProstoRouter } = require('@prostojs/router')

const router = new ProstoRouter()

router.on('GET', '/:param', () => 'ok')
router.lookup('GET', '/param%2523').ctx.params // returns { param: 'param#' }

Ok I guess I got what you mean. Oh boy, thanks for this finding!
I fixed that problem in prostojs/router. Here's the test.

@ivan-tymoshenko
Copy link
Collaborator Author

ivan-tymoshenko commented May 10, 2022

@mav-rik I don't want to discuss @prostojs/router here a lot. But as I understand you just removed URL components decoding. You still should decode URL components, but don't decode them twice or more. And even if you decide to not decode them at all you should return them raw and you don't do that either.

Here as a result should be '#' and not "%23"
https://github.com/prostojs/router/blob/e5e46940b0af2846a5d4464b243be7381fa4fe4d/src/router/router.spec.ts#L340

@mav-rik
Copy link

mav-rik commented May 10, 2022

@mav-rik I don't want to discuss @prostojs/router here a lot. But as I understand you just removed URL components decoding. You still should decode URL components, but don't decode them twice or more. And even if you decide to not decode them at all you should return them raw and you don't do that either.

Here as a result should be '#' and not "%23" https://github.com/prostojs/router/blob/e5e46940b0af2846a5d4464b243be7381fa4fe4d/src/router/router.spec.ts#L340

@ivan-tymoshenko I agree that this is a wrong place do discuss prostojs/router. I also agree that my first fix wasn't very well balanced. This was challenging, but I finally achieved the goal: now I decode the URL but only once.

Again, thanks for this findings. If you find more problems please open issue in prostojs/router directly and I'll be happy to work on it.

@ivan-tymoshenko
Copy link
Collaborator Author

ivan-tymoshenko commented May 12, 2022

I discussed this case with the path-to-regex maintainer. Here is his conclusion about it.
Read from here: pillarjs/path-to-regexp#276 (comment)

@ivan-tymoshenko
Copy link
Collaborator Author

Closed in favour of #282

@ivan-tymoshenko ivan-tymoshenko deleted the fix-parameters-decoding branch May 18, 2022 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decoding URL Components
5 participants