Fix parameters decoding #253

ivan-tymoshenko · 2022-04-19T13:30:54Z

Fixed #234

The idea of this solution is to parse a param, not from bare URL but from decoded by decodeURI function and then parse decode only (# $ & + , / : ; = ? @).

!!! This solution has a problem. If we receive one of these (# $ & + , / : ; = ? @) symbols encoded twice, we will also decoded it twice. #207 (comment)

@mcollina @Eomm any thoughts on how to avoid this?

mcollina · 2022-04-20T21:46:39Z

Given https://owasp.org/www-community/Double_Encoding, we can't regress on this topic.

ivan-tymoshenko · 2022-04-21T17:05:47Z

In that case, the path-to-regexp library has this problem

'use strict'

const { equal } = require('assert')
const { match } = require('path-to-regexp')

function doubleEncode (str) {
  return encodeURIComponent(encodeURIComponent(str))
}

function singleEncode (str) {
  return encodeURIComponent(str)
}

const handler = match('/🍌/:id', { decode: decodeURIComponent });

equal(handler(decodeURI(`/%F0%9F%8D%8C/${singleEncode('#')}`)).params.id, '#'); // equal
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, '#'); // equal
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, singleEncode('#')); // fails

mcollina · 2022-04-21T20:58:28Z

Should we just document the current behavior, or maybe adopt what URLPattern does?

ivan-tymoshenko · 2022-04-21T21:45:25Z

I'll think about it. URLPattern doesn't decode a URL at all. It encodes route and request URLs and then matches two encoded URLs. It uses URL class for it. URL class encodes a URL not just with encodeURI or encodeURIComponent function, it implements the basic URL parser (https://url.spec.whatwg.org/#concept-basic-url-parser) which has some conflicts with rules that find-my-way has. For example, we can have ':' and '#' in the find-my-way routes, but in the URL class, they are used as delimiters. This means we can't just use the URL class for our purposes as URLPattern does.

ivan-tymoshenko · 2022-04-23T16:28:39Z

There is another good solution to this issue. We can compare encoded URLs instead of comparing decoded URLs. I added a new encode option to the on method. If you set the option, the URL will be encoded before setting.

findMyWay.on('GET', '/🍌', handler)

findMyWay.find('GET', '/🍌') // this URL will match the route
findMyWay.find('GET', '/%F0%9F%8D%8C') // this URL won't

findMyWay.on('GET', '/🍌', { encode: true }, handler)

findMyWay.find('GET', '/🍌') // this URL will NOT match the route
findMyWay.find('GET', '/%F0%9F%8D%8C') // this URL will match the route

path-to-regexp resolves this issue in the same way. There is a encode option. The only difference is that you should pass a function that will encode a path. I don't see the case when it will be useful to pass something except the encodeURI function.
https://github.com/pillarjs/path-to-regexp#process-pathname

!!! This is a semver-major change

We don't need to rush into this. It would be better to have as many reviews as possible before it is merged.

ivan-tymoshenko · 2022-04-23T17:03:43Z

Main branch

Current branch

mcollina

I would lgtm this but the question for us is if we skip this in Fastify v4 or not... and how we expose it there.

test/issue-206.test.js

mcollina · 2022-04-23T18:25:00Z

I think you have added a new option, could you document it too?

ivan-tymoshenko · 2022-04-23T19:36:13Z

I was trying to figure out how Express provides this path-to-regexp option. It looks like it just ignore it.

mcollina

lgtm

climba03003

Can we save the original path for display in prettyPrint method?

mcollina · 2022-05-05T12:08:37Z

This PR actually fixes a regression introduced in #207. Without this the following test would not pass:

diff --git a/test/find.test.js b/test/find.test.js
index 2de277f..9802494 100644
--- a/test/find.test.js
+++ b/test/find.test.js
@@ -15,3 +15,23 @@ test('find calls can pass no constraints', t => {
   t.ok(findMyWay.find('GET', '/a/b'))
   t.notOk(findMyWay.find('GET', '/a/b/c'))
 })
+
+test('find with encoded routes', t => {
+  t.plan(2)
+  const findMyWay = FindMyWay()
+
+  findMyWay.on('GET', '/a%20.md', () => {
+    t.pass('handler called')
+  })
+
+  t.ok(findMyWay.find('GET', '/a%20.md'))
+
+  findMyWay.lookup({
+    method: 'GET',
+    url: '/a%20.md'
+  }, {
+    end () {
+      t.fail('end should not be called')
+    }
+  })
+})

Note that the above test does not fail on v4. This is needed to update fastify-static to Fastify v4, so I'm inclined to release it as semver-minor.

@Eomm wdyt?

Eomm

@Eomm wdyt?

@mcollina I think this PR reverts the PR you pointed out, but it removes a feature.

The #207 focuses on decoding the path parameters

This PR, instead:

provides an encode option that acts during the path parameter digesting into the radix-tree
restoring the old logic and the old the lib doesn't handle url-encoded URI properly #206 issue. I recall that the main issue was the fast-decode-uri-component usage. I don't remember further details.

Regarding this PR, in fact, this test does not pass:

test('find with encoded routes', t => {
  t.plan(2)
  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/~smith/home.html', () => {
    t.pass('handler called')
  })

  t.ok(findMyWay.find('GET', '/~smith/home.html'))
  t.ok(findMyWay.find('GET', '/%7Esmith/home.html'))
})

Instead, regarding the failing test you added, I think the correct endpoint to call would be:

t.ok(findMyWay.find('GET', '/a%2520.md'))
// instead of
t.ok(findMyWay.find('GET', '/a%20.md'))

because it would be the client that, finding a special character, should encode it.

At this point, I think we can't support both options

index.js

mcollina · 2022-05-05T12:58:33Z

~~Do you recommend we revert #207 instead? I really need the above bug fixed.~~

Reverting seems somewhat impossible.

ivan-tymoshenko · 2022-05-05T13:04:23Z

If you can point me to a problem, I can help.

mcollina · 2022-05-05T13:05:04Z

On:

t.ok(findMyWay.find('GET', '/a%2520.md'))
// instead of
t.ok(findMyWay.find('GET', '/a%20.md'))

Why? The file we want is /a .md, I don't think we should encode that.

mcollina · 2022-05-05T13:05:15Z

If you can point me to a problem, I can help.

I need #253 passing.

mcollina · 2022-05-05T13:08:29Z

@mcollina I think this PR reverts the PR you pointed out, but it removes a feature.

@Eomm what feature does it reverts? All tests seems passing.

ivan-tymoshenko · 2022-05-05T13:11:38Z

@mcollina It's a breaking change. You can't anymore match not decoded and one time decoded URL, setting only one not decoded URL. You need to set them both.

ivan-tymoshenko · 2022-05-05T13:14:29Z

You will match exactly that route, that you set up. encode options just helps you to encode your URL, so that you don't need to do it manually.

mcollina · 2022-05-05T13:29:26Z

I've resolved the problem with fastify-static by removing the encodeURI: https://github.com/fastify/fastify-static/blob/dee5d8014c0831dc4721f3b09062a75d7348eead/index.js#L341

Eomm · 2022-05-05T17:20:28Z

what feature does it reverts? All tests seems passing.

The issue-206.test.js file has changes that need to be added to pass:

it adds some routes that would be 404 otherwise: because the decoding is not executed
it adds the encode option for the same reason

I'll try to clarify my point: starting from the spec: https://www.rfc-editor.org/rfc/rfc3986#section-2.4

Because the percent (“%”) character serves as the indicator for
percent-encoded octets, it must be percent-encoded as “%25" for that
octet to be used as data within a URI

for this statement, I think that rollbacking the #207 PR is incorrect.

I think this module must support the following test without any additional options set by the developer:

test('find with encoded routes', t => {
  t.plan(2)
  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/~smith/home.html', () => {
    t.pass('handler called')
  })

  t.ok(findMyWay.find('GET', '/~smith/home.html'))
  t.ok(findMyWay.find('GET', '/%7Esmith/home.html'))
})

So, I think this PR can add the encode option, but it needs to restore the old path parameter logic

ivan-tymoshenko · 2022-05-05T18:41:22Z

@Eomm Yes, some of the already created routes will not work. That is why this is a breaking change. But the main difference is that with this PR, you can create and match any route you want, and in the current state, you can't. For example, if the route contains an encoded part in a static and parametric part, you have no way to implement this.

test('find with encoded routes', t => {
  t.plan(2)
  const findMyWay = FindMyWay()

  findMyWay.on('GET', '/🍌smith/:param', () => {
    t.pass('handler called')
  })

  t.ok(findMyWay.find('GET', '/🍌smith/%2520.m'))
  t.ok(findMyWay.find('GET', '/%F0%9F%8D%8Csmith/%2520.m'))
})

ivan-tymoshenko · 2022-05-05T18:50:50Z

path-to-regexp and URLPattern also don't decode a url with the decodeURI function. path-to-regexp match exactly the same path as the path you registered and then can decode params with decodeURIComponent and URLPattern creates an instance of URL class which encodes a URL (non exactly with the encodeURI function, but in a similar way).

ivan-tymoshenko · 2022-05-05T19:00:33Z

If you want, we can set both raw and encoded URLs by default. I don't think it's a great idea, but it will make the number of routes that will fail after updating less.

ivan-tymoshenko · 2022-05-09T19:41:41Z

I'm trying to implement the second version of this fix. I want to make it work as URLPattern. The problem is that URLPattern also doesn't handle all cases that we want.

Examples that don't work:

const pattern = new URLPattern('https://host.com/[...]/a .html')
pattern.exec('https://host.com/%5B...%5D/a .html')

const pattern = new URLPattern('https://host.com/%20.html')
pattern.exec('https://host.com/%2520.html') // double encoding

ivan-tymoshenko · 2022-05-09T19:44:26Z

I don't want to leave this issue unresolved, because the current implementation has a significant bug and is not very fast.

ivan-tymoshenko · 2022-05-09T19:52:26Z

I will explain the problem one more time. When we decode a path with the decodeURI function, it decodes some chars and shrinks the path, so we can't tell where the param/wildcard part starts anymore.

mcollina · 2022-05-10T06:44:57Z

The problem is that URLPattern also doesn't handle all cases that we want.

Open an issue on URLPattern.

mcollina · 2022-05-10T06:46:27Z

I'm not really convinced of this and @Eomm is not in agreement.

(I'm currently focusing on shipping Fastify v4).

ivan-tymoshenko · 2022-05-10T08:46:56Z

The URLPattern doesn't have a bug. This is how it is intended. They don't decode a URL with the decodeURI function. Instead they "canonicalize" URL in the same way as a URL class constructor does.
https://wicg.github.io/urlpattern/#canonicalize-a-pathname
https://url.spec.whatwg.org/#concept-basic-url-parser

The main point is there is no specification on how dynamic pathname should be decoded. @Eomm If you know any spec or any real example that decodes a URL with params in the pathname, please share it with me.

ivan-tymoshenko · 2022-05-10T11:33:56Z

@mcollina Shouldn't we fix it then (maybe with another fix) and add a fix to the fastify v4?

ivan-tymoshenko · 2022-05-10T13:31:19Z

@mav-rik Hi, you made a great job discovering problem #206. I want to ask you if you have time to help here. There is a problem with the solution you suggested.

mav-rik · 2022-05-10T17:22:25Z

@mav-rik Hi, you made a great job discovering problem #206. I want to ask you if you have time to help here. There is a problem with the solution you suggested.

@ivan-tymoshenko I don't think there is a problem with the solution I suggested, there is a problem with the implementation of find-my-way.
I agree with #234 that the test case with /%F0%9F%8D%8C-foo%23bar must be resolved to '🍌-foo#bar'.
I don't really want to deep dive into find-my-way implementation. Honestly, I find it a bit overwhelmed with unnecessary logic.

I checked the case from #234 on prostojs/router (an alternative to find-my-way) and it processes it just as you and I would expect. I guess this hardly solves your problem as you are probably heavily dependent on fastify and/or find-my-way. But it is what it is...

ivan-tymoshenko · 2022-05-10T17:28:37Z

I will explain the problem one more time. When we decode a path with the decodeURI function, it decodes some chars and shrinks the path, so we can't tell where the param/wildcard part starts anymore.

@mav-rik Thanks for your reply. I will check how the prostojs/router works. Don't think about find-my-way implementation. Here is a problem described in one sentence.

mav-rik · 2022-05-10T17:38:24Z

I will explain the problem one more time. When we decode a path with the decodeURI function, it decodes some chars and shrinks the path, so we can't tell where the param/wildcard part starts anymore.

@ivan-tymoshenko so we can't tell where the param/wildcard part starts anymore - why so? This is how find-my-way implemented right? Could you give a test case from the real scenario?

ivan-tymoshenko · 2022-05-10T17:49:46Z

@mav-rik I checked how prostojs/router works. It double-decodes params. It's the easiest solution. Decode a full URL first and then decode a param. Unfortunately this is a vulnerability.
https://owasp.org/www-community/Double_Encoding

const { ProstoRouter } = require('@prostojs/router')

const router = new ProstoRouter()

router.on('GET', '/:param', () => 'ok')
router.lookup('GET', '/param%2523').ctx.params // returns { param: 'param#' }

ivan-tymoshenko · 2022-05-10T17:57:50Z

@mav-rik Here is a real example.

Pattern route: /~:param
Input route: /%7E%2523

Expected param: %23

mav-rik · 2022-05-10T18:04:36Z

@mav-rik I checked how prostojs/router works. It double-decodes params. It's the easiest solution. Decode a full URL first and then decode a param. Unfortunately this is a vulnerability. https://owasp.org/www-community/Double_Encoding
const { ProstoRouter } = require('@prostojs/router')

const router = new ProstoRouter()

router.on('GET', '/:param', () => 'ok')
router.lookup('GET', '/param%2523').ctx.params // returns { param: 'param#' }

Ok I guess I got what you mean. Oh boy, thanks for this finding!
I fixed that problem in prostojs/router. Here's the test.

ivan-tymoshenko · 2022-05-10T18:31:21Z

@mav-rik I don't want to discuss @prostojs/router here a lot. But as I understand you just removed URL components decoding. You still should decode URL components, but don't decode them twice or more. And even if you decide to not decode them at all you should return them raw and you don't do that either.

Here as a result should be '#' and not "%23"
https://github.com/prostojs/router/blob/e5e46940b0af2846a5d4464b243be7381fa4fe4d/src/router/router.spec.ts#L340

mav-rik · 2022-05-10T18:37:14Z

@mav-rik I don't want to discuss @prostojs/router here a lot. But as I understand you just removed URL components decoding. You still should decode URL components, but don't decode them twice or more. And even if you decide to not decode them at all you should return them raw and you don't do that either.

Here as a result should be '#' and not "%23" https://github.com/prostojs/router/blob/e5e46940b0af2846a5d4464b243be7381fa4fe4d/src/router/router.spec.ts#L340

@ivan-tymoshenko I agree that this is a wrong place do discuss prostojs/router. I also agree that my first fix wasn't very well balanced. This was challenging, but I finally achieved the goal: now I decode the URL but only once.

Again, thanks for this findings. If you find more problems please open issue in prostojs/router directly and I'll be happy to work on it.

ivan-tymoshenko · 2022-05-12T17:49:46Z

I discussed this case with the path-to-regex maintainer. Here is his conclusion about it.
Read from here: pillarjs/path-to-regexp#276 (comment)

ivan-tymoshenko · 2022-05-13T17:39:56Z

Closed in favour of #282

ivan-tymoshenko force-pushed the fix-parameters-decoding branch 4 times, most recently from c9fb58c to 66ea94b Compare April 23, 2022 15:18

ivan-tymoshenko marked this pull request as ready for review April 23, 2022 17:03

mcollina approved these changes Apr 23, 2022

View reviewed changes

test/issue-206.test.js Outdated Show resolved Hide resolved

mcollina approved these changes Apr 25, 2022

View reviewed changes

ivan-tymoshenko added bugfix semver-major labels Apr 28, 2022

ivan-tymoshenko mentioned this pull request Apr 30, 2022

Add option to encode URL before inserting fastify/fastify#3867

Closed

2 tasks

climba03003 reviewed May 1, 2022

View reviewed changes

Eomm reviewed May 5, 2022

View reviewed changes

index.js Show resolved Hide resolved

index.js Show resolved Hide resolved

climba03003 mentioned this pull request May 5, 2022

Update to fastify@4 fastify/fastify-static#292

Merged

3 tasks

Fix parameters decoding

5ed535f

ivan-tymoshenko force-pushed the fix-parameters-decoding branch from 7bf0b8d to 5ed535f Compare May 9, 2022 16:14

ivan-tymoshenko mentioned this pull request May 9, 2022

Optimize querystring trimming #276

Closed

ivan-tymoshenko mentioned this pull request May 10, 2022

Decode URL params prostojs/router#1

Closed

Set encode option true by default

f275a1f

ivan-tymoshenko force-pushed the fix-parameters-decoding branch from 194ce16 to f275a1f Compare May 11, 2022 11:20

ivan-tymoshenko closed this May 13, 2022

ivan-tymoshenko deleted the fix-parameters-decoding branch May 18, 2022 03:57

Fix parameters decoding #253

Fix parameters decoding #253

Conversation

ivan-tymoshenko commented Apr 19, 2022

mcollina commented Apr 20, 2022

ivan-tymoshenko commented Apr 21, 2022

mcollina commented Apr 21, 2022

ivan-tymoshenko commented Apr 21, 2022

ivan-tymoshenko commented Apr 23, 2022

ivan-tymoshenko commented Apr 23, 2022

mcollina left a comment

Choose a reason for hiding this comment

mcollina commented Apr 23, 2022

ivan-tymoshenko commented Apr 23, 2022

mcollina left a comment

Choose a reason for hiding this comment

climba03003 left a comment

Choose a reason for hiding this comment

mcollina commented May 5, 2022 • edited

Eomm left a comment

Choose a reason for hiding this comment

mcollina commented May 5, 2022 • edited

ivan-tymoshenko commented May 5, 2022

mcollina commented May 5, 2022

mcollina commented May 5, 2022

mcollina commented May 5, 2022

ivan-tymoshenko commented May 5, 2022

ivan-tymoshenko commented May 5, 2022

mcollina commented May 5, 2022

Eomm commented May 5, 2022

ivan-tymoshenko commented May 5, 2022

ivan-tymoshenko commented May 5, 2022

ivan-tymoshenko commented May 5, 2022

ivan-tymoshenko commented May 9, 2022

ivan-tymoshenko commented May 9, 2022

ivan-tymoshenko commented May 9, 2022

mcollina commented May 10, 2022

mcollina commented May 10, 2022

ivan-tymoshenko commented May 10, 2022

ivan-tymoshenko commented May 10, 2022 • edited

ivan-tymoshenko commented May 10, 2022

mav-rik commented May 10, 2022

ivan-tymoshenko commented May 10, 2022

mav-rik commented May 10, 2022 • edited

ivan-tymoshenko commented May 10, 2022

ivan-tymoshenko commented May 10, 2022

mav-rik commented May 10, 2022

ivan-tymoshenko commented May 10, 2022 • edited

mav-rik commented May 10, 2022 • edited

ivan-tymoshenko commented May 12, 2022 • edited

ivan-tymoshenko commented May 13, 2022

mcollina commented May 5, 2022 •

edited

mcollina commented May 5, 2022 •

edited

ivan-tymoshenko commented May 10, 2022 •

edited

mav-rik commented May 10, 2022 •

edited

ivan-tymoshenko commented May 10, 2022 •

edited

mav-rik commented May 10, 2022 •

edited

ivan-tymoshenko commented May 12, 2022 •

edited