Double decoding path parameters #276

ivan-tymoshenko · 2022-04-21T16:50:39Z

Hi, I have a question about decoding path params. I have a URL that contains encoded symbols in its static and parametric parts. To match the route, I decode the URL using the decodeURI function, but in the parameter, I have one of these symbols (# $ & + , / : ; = ? @) that doesn't decode by decodeURI. It decodes with the decodeURIComponent function. If someone encodes one of these symbols twice, it will be decoded also twice. I'm wondering if there would be a problem of double decoding in this case?
https://owasp.org/www-community/Double_Encoding

'use strict'

const { equal } = require('assert')
const { match } = require('path-to-regexp')

function doubleEncode (str) {
  return encodeURIComponent(encodeURIComponent(str))
}

function singleEncode (str) {
  return encodeURIComponent(str)
}

const handler = match('/🍌/:id', { decode: decodeURIComponent });

equal(handler(decodeURI(`/%F0%9F%8D%8C/${singleEncode('#')}`)).params.id, '#'); // ok
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, '#'); // ok
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, singleEncode('#')); // fails

blakeembrey · 2022-04-22T04:05:51Z

There definitely would be if you're using both decode and decoding it before handling. In that case you're probably better off encoding the input string to match using encodeURI and keeping the decode logic the same. Haven't thought it through 100% sure but it seems logical enough.

blakeembrey · 2022-04-22T04:08:23Z

Have you considered using { encode: encodeURI, decode: decodeURIComponent }?

ivan-tymoshenko · 2022-04-25T10:13:14Z

I guess that can help. Thanks.

ivan-tymoshenko · 2022-05-11T13:26:49Z

Hi, I found a tricky case. Just want to ask what is the correct way to deal with it. How should I set this route?

Pattern route: /~:param
Input route: /%7E%2523

Expected param: %23

blakeembrey · 2022-05-11T22:13:41Z

This is definitely a very tricky problem. Every character can technically have 2 (or more due to different ways to encode). I can think of a way to work with this. Two initial thoughts:

Configuration option that encodes (char: string) => string[] so every character can be represented in the regex
Guidance/utility that normalizes before passing it into path to regexp

Option 1 is simplest, but I'm leaning toward option 2 for performance reasons. I'll mull this over this week, but feel free to add your own thoughts on the topic 😄 Option 2 also helps with other routing libraries, since normalization is a common problem between them all. I also think that, conveniently, the same library could be used to normalize the input to path-to-regexp as the URL itself.

ivan-tymoshenko · 2022-05-11T22:38:05Z

About 1 option. Each symbol in the path can be represented in the ASCII encoding. If you add this option, that would mean if I want to be sure that my path match in 100% of cases, I will need to add these function (char: string) => string[] for each symbol for each route. And you're right, it would be a significant performance drop.

2 option. Can you give me an example of this "normalization"? Or an example of another routing lib, that can it?

ivan-tymoshenko · 2022-05-11T22:55:12Z

nodejs:

new URL('/%7E', 'http://test.c').pathname // /%7E

chrome:

new URL('/%7E', 'http://test.c').pathname // /~

blakeembrey · 2022-05-11T23:18:11Z

Oh wow, I didn't expect those to act differently, I would have expected both to return ~. I think https://web.dev/urlpattern/ will likely attempt to resolve these differences, but I'm not familiar with any other node.js path matching or router libraries working on this problem.

For option 2, it'd likely be written by hand. Something that has consistent rules for everything, and does the kind of normalization URL is meant to do plus some extras like changing repeated //// to single slash.

ivan-tymoshenko · 2022-05-11T23:24:21Z

About URLPattern. You can join to the conversation kenchris/urlpattern-polyfill#93

ivan-tymoshenko · 2022-05-11T23:31:11Z

About writing own normalization lib. It should normalize but not decode the URL. I’m not sure that it is a good idea.

blakeembrey · 2022-05-11T23:38:13Z

Yep, exactly. To avoid these issues we'd need to focus on normalizing the format to something that won't be confused. E.g. always go to the encoded format.

blakeembrey · 2022-05-11T23:39:28Z

To clarify your example, the issue is mostly just the ~ not encoding by default. It'd work as expected using /%7E:param. So what we'd want to build is just a kind of superset of encodeURI. That way when you use it for path-to-regexp and for URL inputs both ~ and %7E end up as %7E and match.

ivan-tymoshenko · 2022-05-11T23:46:15Z

I think that “~” should be normalized path.

ivan-tymoshenko · 2022-05-11T23:50:57Z

Encoding all not encoded chars for each request would be expensive.

blakeembrey · 2022-05-11T23:53:08Z

If you run both inputs thought the same "encoder/normalization" it technically shouldn't matter. Trivial sample code would be x.replace(/[^%a-z0-9]/g, x => '%' + x.charCodeAt(0).toString(16)) (pretty sure this'll break but the example stands).

Encoding all not encoded chars for each request would be expensive.

Not really, but there's also no other solution to what you're asking for.

blakeembrey · 2022-05-11T23:57:17Z

Another trivial example is handling of (space) - some servers expect + to work, others %20, this is the kind of thing that could be normalized by a library. Alternatively, you can decide it's not worth handling as-is and just require the client to be using ~ which is all the browsers encode it as.

blakeembrey · 2022-05-11T23:59:04Z

For example, it's not as if GitHub supports URL encoding all the random characters in this URL and have it still work. There's only certain normalizations they apply before trying to route. If we're worried about performance it's better to leave that decision up to people's frameworks or applications.

import-brain added the question label Apr 23, 2022

ivan-tymoshenko closed this as completed Apr 25, 2022

ivan-tymoshenko reopened this May 11, 2022

ivan-tymoshenko mentioned this issue May 12, 2022

Fix parameters decoding delvedor/find-my-way#253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double decoding path parameters #276

Double decoding path parameters #276

ivan-tymoshenko commented Apr 21, 2022 •

edited

blakeembrey commented Apr 22, 2022

blakeembrey commented Apr 22, 2022

ivan-tymoshenko commented Apr 25, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022 •

edited

ivan-tymoshenko commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022

blakeembrey commented May 11, 2022 •

edited

ivan-tymoshenko commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022 •

edited

blakeembrey commented May 11, 2022 •

edited

blakeembrey commented May 11, 2022

Double decoding path parameters #276

Double decoding path parameters #276

Comments

ivan-tymoshenko commented Apr 21, 2022 • edited

blakeembrey commented Apr 22, 2022

blakeembrey commented Apr 22, 2022

ivan-tymoshenko commented Apr 25, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022 • edited

ivan-tymoshenko commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022

blakeembrey commented May 11, 2022 • edited

ivan-tymoshenko commented May 11, 2022

ivan-tymoshenko commented May 11, 2022

blakeembrey commented May 11, 2022 • edited

blakeembrey commented May 11, 2022 • edited

blakeembrey commented May 11, 2022

ivan-tymoshenko commented Apr 21, 2022 •

edited

blakeembrey commented May 11, 2022 •

edited

blakeembrey commented May 11, 2022 •

edited

blakeembrey commented May 11, 2022 •

edited

blakeembrey commented May 11, 2022 •

edited