Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double decoding path parameters #276

Open
ivan-tymoshenko opened this issue Apr 21, 2022 · 17 comments
Open

Double decoding path parameters #276

ivan-tymoshenko opened this issue Apr 21, 2022 · 17 comments
Labels

Comments

@ivan-tymoshenko
Copy link

ivan-tymoshenko commented Apr 21, 2022

Hi, I have a question about decoding path params. I have a URL that contains encoded symbols in its static and parametric parts. To match the route, I decode the URL using the decodeURI function, but in the parameter, I have one of these symbols (# $ & + , / : ; = ? @) that doesn't decode by decodeURI. It decodes with the decodeURIComponent function. If someone encodes one of these symbols twice, it will be decoded also twice. I'm wondering if there would be a problem of double decoding in this case?
https://owasp.org/www-community/Double_Encoding

'use strict'

const { equal } = require('assert')
const { match } = require('path-to-regexp')

function doubleEncode (str) {
  return encodeURIComponent(encodeURIComponent(str))
}

function singleEncode (str) {
  return encodeURIComponent(str)
}

const handler = match('/🍌/:id', { decode: decodeURIComponent });

equal(handler(decodeURI(`/%F0%9F%8D%8C/${singleEncode('#')}`)).params.id, '#'); // ok
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, '#'); // ok
equal(handler(decodeURI(`/%F0%9F%8D%8C/${doubleEncode('#')}`)).params.id, singleEncode('#')); // fails
@blakeembrey
Copy link
Member

There definitely would be if you're using both decode and decoding it before handling. In that case you're probably better off encoding the input string to match using encodeURI and keeping the decode logic the same. Haven't thought it through 100% sure but it seems logical enough.

@blakeembrey
Copy link
Member

Have you considered using { encode: encodeURI, decode: decodeURIComponent }?

@ivan-tymoshenko
Copy link
Author

I guess that can help. Thanks.

@ivan-tymoshenko
Copy link
Author

Hi, I found a tricky case. Just want to ask what is the correct way to deal with it. How should I set this route?

Pattern route: /~:param
Input route: /%7E%2523

Expected param: %23

@blakeembrey
Copy link
Member

This is definitely a very tricky problem. Every character can technically have 2 (or more due to different ways to encode). I can think of a way to work with this. Two initial thoughts:

  1. Configuration option that encodes (char: string) => string[] so every character can be represented in the regex
  2. Guidance/utility that normalizes before passing it into path to regexp

Option 1 is simplest, but I'm leaning toward option 2 for performance reasons. I'll mull this over this week, but feel free to add your own thoughts on the topic 😄 Option 2 also helps with other routing libraries, since normalization is a common problem between them all. I also think that, conveniently, the same library could be used to normalize the input to path-to-regexp as the URL itself.

@ivan-tymoshenko
Copy link
Author

About 1 option. Each symbol in the path can be represented in the ASCII encoding. If you add this option, that would mean if I want to be sure that my path match in 100% of cases, I will need to add these function (char: string) => string[] for each symbol for each route. And you're right, it would be a significant performance drop.

2 option. Can you give me an example of this "normalization"? Or an example of another routing lib, that can it?

@ivan-tymoshenko
Copy link
Author

nodejs:

new URL('/%7E', 'http://test.c').pathname // /%7E

chrome:

new URL('/%7E', 'http://test.c').pathname // /~

@blakeembrey
Copy link
Member

blakeembrey commented May 11, 2022

Oh wow, I didn't expect those to act differently, I would have expected both to return ~. I think https://web.dev/urlpattern/ will likely attempt to resolve these differences, but I'm not familiar with any other node.js path matching or router libraries working on this problem.

For option 2, it'd likely be written by hand. Something that has consistent rules for everything, and does the kind of normalization URL is meant to do plus some extras like changing repeated //// to single slash.

@ivan-tymoshenko
Copy link
Author

About URLPattern. You can join to the conversation kenchris/urlpattern-polyfill#93

@ivan-tymoshenko
Copy link
Author

About writing own normalization lib. It should normalize but not decode the URL. I’m not sure that it is a good idea.

@blakeembrey
Copy link
Member

Yep, exactly. To avoid these issues we'd need to focus on normalizing the format to something that won't be confused. E.g. always go to the encoded format.

@blakeembrey
Copy link
Member

blakeembrey commented May 11, 2022

To clarify your example, the issue is mostly just the ~ not encoding by default. It'd work as expected using /%7E:param. So what we'd want to build is just a kind of superset of encodeURI. That way when you use it for path-to-regexp and for URL inputs both ~ and %7E end up as %7E and match.

@ivan-tymoshenko
Copy link
Author

I think that “~” should be normalized path.

@ivan-tymoshenko
Copy link
Author

Encoding all not encoded chars for each request would be expensive.

@blakeembrey
Copy link
Member

blakeembrey commented May 11, 2022

If you run both inputs thought the same "encoder/normalization" it technically shouldn't matter. Trivial sample code would be x.replace(/[^%a-z0-9]/g, x => '%' + x.charCodeAt(0).toString(16)) (pretty sure this'll break but the example stands).

Encoding all not encoded chars for each request would be expensive.

Not really, but there's also no other solution to what you're asking for.

@blakeembrey
Copy link
Member

blakeembrey commented May 11, 2022

Another trivial example is handling of (space) - some servers expect + to work, others %20, this is the kind of thing that could be normalized by a library. Alternatively, you can decide it's not worth handling as-is and just require the client to be using ~ which is all the browsers encode it as.

@blakeembrey
Copy link
Member

For example, it's not as if GitHub supports URL encoding all the random characters in this URL and have it still work. There's only certain normalizations they apply before trying to route. If we're worried about performance it's better to leave that decision up to people's frameworks or applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants