Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path is decoded in createLocation #505

Closed
ArthurKnaus opened this issue Jul 26, 2017 · 72 comments · Fixed by #656
Closed

Path is decoded in createLocation #505

ArthurKnaus opened this issue Jul 26, 2017 · 72 comments · Fixed by #656

Comments

@ArthurKnaus
Copy link

This results in an invalid URL being pushed to the HTML5 history element when there are encoded percent signs in the pathname.

E.g.:

history.push({
  pathname: `randomlocation/${ encodeURIComponent("AB%C") }/furthermore`
});

results in "randomlocation/AB%C/futhermore" being pushed to the HTML5 history.

@pshrmn
Copy link
Contributor

pshrmn commented Jul 26, 2017

Alright, so just to be clear, the reasoning behind decoding in createLocation is that it allows us to have location object's whose pathname is decoded. That means that React Router users can use decoded strings in their route paths.

// with decoding
<Route path='/역사' />
// without decoding
<Route path='/%EC%97%AD%EC%82%AC' />

Another effect of this is that params will be automatically decoded.

<Route path='/greeting/:word' />
// user visits /greeting/안녕하세요
match === {
  ...,
  params: { word: '안녕하세요' }
  // without decoding this would be %EC%95%88%EB%85%95%ED%95%98%EC%84%B8%EC%9A%94
}

History then relies on the browser to automatically encode the pathname when we push to the browser. This appears to work for all characters except for the percent sign.

There are a few thought I have on possible ways to deal with this:

Stop all decoding

I don't think that this is desirable, but I'll at least throw it out there. We could leave the decoding to the program that is using history's locations. location.pathname would just be what the user provides. Of course, this would lead to to inconsistencies between the locations created from window.location and locations created from values passed by the user.

// URI = '/역사'
// window.location.pathname = '/%EC%97%AD%EC%82%AC'
{ pathname: '/%EC%97%AD%EC%82%AC', ...}

history.push('/역사')
{ pathname: '/역사', ...}

Add a raw property

Before we decode the pathname, we can store the original value as location.raw (or maybe rawPathname, name isn't important here). Then, when we create a URI, we would concatenate using location.raw instead of location.pathname.

Prevent decoding of %25

Instead of just calling location.pathname = decodeURI(location.pathname), we could modify that to skip decoding the character string %25, which is % encoded.

const decodeButSkipPercent = pathname => (
  pathname.split('%25')
    .map(decodeURI)
    .join('%25')
)

That would work, but we would also end up with location's whose pathname is not necessarily fully decoded. The exception should only apply to the encoded percent sign, but it is still an exception.


I'm not sure which of these is the best choice (or maybe there is a better alternative that I did not think of?) 🤷‍♀️

@dominykas
Copy link

Of course, this would lead to to inconsistencies between the locations created from window.location and locations created from values passed by the user.

I'm sorry, I'm not familiar with the code, but is it not possible to decodeURI only and only when creating a location from window.location? I'd assume the apps/devs would never read the location directly from there, because they access it via this lib? This is the place where history does know the context and can behave accordingly?

@kellyrmilligan
Copy link

what is the recommended workaround for this? in components, the route params are only partially decoded, see referenced issue. we have file names that are coming from users, and sometimes they have ? in the name.

@janhoeck
Copy link

I have the same problem.
I try to push this to the history:
history.push('/data/Amazing%20Example')

and history converts this to the url: http://localhost:3000/data/Amazing Example

History removes my empty space! But why :)

@Kapaacius
Copy link

I guess I have a similar issue as @janhoeck
I want to encode part of my link because it contains stringified object
to={ /123/${encodeURIComponent(JSON.stringify({ encode: "me" }))} }
But when I now click on the link, it ends up being
#/123/{"encode"%3A"me"}
this on the url...it kinda works but what dies for me is when someone pastes such URL to slack or some other location. Clicking on that does another set of encoding magic and then the url does not work anymore.

How can I work around it? ( 2x encodeURIComponent is not really a nice fix...but would kinda help right now :D )

@OliverJAsh
Copy link
Contributor

@pshrmn

History then relies on the browser to automatically encode the pathname when we push to the browser. This appears to work for all characters except for the percent sign.

Do you know why browsers don't encode the % character, but encode everything else? Is it a bug or intentional for some reason? Is it applicable in all browsers?

@pshrmn
Copy link
Contributor

pshrmn commented Jan 18, 2018

@OliverJAsh Browsers use percent-encoding, so if they encoded the %, that could break pre-encoded paths. From https://tools.ietf.org/html/rfc3986#section-2.4 (emphasis mine).

Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI. Implementations must not
percent-encode or decode the same string more than once, as decoding
an already decoded string might lead to misinterpreting a percent
data octet as the beginning of a percent-encoding, or vice versa in
the case of percent-encoding an already percent-encoded string.

@OliverJAsh
Copy link
Contributor

OliverJAsh commented Jan 18, 2018

Ah. I've noticed browsers also do this when you enter a URL in the address bar, e.g. https://google.com/?search=sale 50% off redirects to https://google.com/?search=sale%2050%%20off instead of the correct https://google.com/?search=sale%2050%25%20off.

IIUC: as a user, when entering the URL in the address bar, we know the query params are not encoded, but the computer does not!

@OliverJAsh
Copy link
Contributor

OliverJAsh commented Jan 18, 2018

Perhaps one argument in favour of stopping all decoding in this module is for consistency with HTML5's native history. Currently we have this inconsistency:

history.push(`/foo/${encodeURIComponent('foo % bar')}`)
// => URL is "/foo/foo%20%%20bar"
// => `history.location.pathname` is `/foo/foo % bar`

window.history.pushState(null, '', `/foo/${encodeURIComponent('foo % bar')}`)
// => URL is "/foo/foo%20%25%20bar"
// => `window.location.pathname` is `/foo/foo%20%25%20bar`

Both the generated URL and pathname are different in these two cases.

@mjackson
Copy link
Member

I think my vote is to make our push as consistent with pushState as we can. That just makes sense.

@pshrmn Isn't there some other way we can provide decoded parameters to RR users? Instead of decoding the pathnames in history, can we do it later in the router (i.e. just before we need to match with path-to-regexp)?

@pshrmn
Copy link
Contributor

pshrmn commented Jan 19, 2018

RR could decode in matchPath (or maybe higher up the chain/memoized so it isn't called as often). Off the top of my head I can't recall if there was a reason that was avoided before, but I want to say that the idea was that doing it in history would make it available to all history users, not just RR users.

If decoding is removed, there will be an inconsistency in location objects unless history expects to be given encoded pathnames. Whether or not there are real consequences to this, I am not sure (pushing the same location and whether that pushes/replaces is the one that comes to mind).

@pshrmn
Copy link
Contributor

pshrmn commented Jan 19, 2018

I haven't thought through the implications, but if you wanted to enforce that pathnames are encoded for consistency, that could be done using an anchor. That would leave encoded ones alone, but ensure anything that should be encoded is.

function encode(pathname) {
  const a = document.createElement('a');
  a.setAttribute('href', pathname);
  return a.pathname;
}

@OliverJAsh
Copy link
Contributor

OliverJAsh commented Jan 22, 2018

@pshrmn

This appears to work for all characters except for the percent sign.

It also appears this doesn't work for & characters. To test this:

<a href="https://google.com/foo & bar"/>go</a>

When clicked, the browser navigates to https://google.com/foo%20&%20bar (& is not URI encoded), instead of the https://google.com/foo%20%26%20bar (& is URI encoded).

I don't know whether this needs to be considered as part of this or not—just pointing it out.

@pshrmn
Copy link
Contributor

pshrmn commented Jan 22, 2018

@OliverJAsh that is the search segment of a URI, which the history package doesn't touch.

@OliverJAsh
Copy link
Contributor

@pshrmn I've updated my example to show the same problem for pathnames.

@OliverJAsh
Copy link
Contributor

OliverJAsh commented Jan 22, 2018

@pshrmn In fact it would seem the browser only URI encodes spaces, unlike encodeURIComponent. Example: http://output.jsbin.com/ceqerer/1.

I presume it's using encodeURI under the hood, which unlike encodeURIComponent doesn't encode reserved characters (&, …).

(My example uses query parameters for easy testing, but the same browser mechanism is used for pathnames.)

@pshrmn
Copy link
Contributor

pshrmn commented Jan 22, 2018

I'm trying to remember exactly what I was talking about in that comment (it has been six months since I wrote it). I believe that what I meant by "this appears to work for all characters except for the percent sign" is that the percent sign is the only character that we cannot safely decode and expect the browser to properly encode.

From MDN, decodeURI (which is what history is using to decode pathnames) does the following:

Replaces each escape sequence in the encoded URI with the character that it represents, but does not decode escape sequences that could not have been introduced by encodeURI. The character “#” is not decoded from escape sequences.

Also from MDN, the following characters (which does not include the percent sign) are not encoded by encodeURI (so the encoded sequences for these characters will not be decoded by decodeURI):

A-Z a-z 0-9 ; , / ? : @ & = + $ - _ . ! ~ * ' ( ) #

history.push('/test&ing');
const decoded = decodeURI('/test&ing'); // /test&ing
history.pushState(null, '', decoded)
// window.location.pathname = '/test&ing'

decodeURI won't change the escape sequence %26 because it could not have been introduced by encodeURI

history.push('/test%26ing');
const decoded = decodeURI('/test%26ing'); // /test%26ing
// window.location.pathname = '/test%26ing'

encodeURI can encode a percent sign (encodeURI('%') === '%25'), so decodeURI will decode %25. However, the browser will not encode the percent sign (because it would not want to encode %26 into %2526), which is the source of this issue.

history.push('/test%25ing'); // %25 = encodeURI('%')
const decoded = decodeURI('/test%25ing'); // /test%ing
// window.location.pathname = '/test%ing'

@OliverJAsh
Copy link
Contributor

OliverJAsh commented Jan 22, 2018

@pshrmn That makes sense!

IIUC, this issue only concerns the percent symbol and not other characters that will be decoded by decodeURI because the percent symbol is particularly problematic as the browser will not re-encode it, unlike other characters.

However, it is quite surprising to find that some characters in location.pathname (and location params in React Router) are decoded whilst others are not. As you have explained, this is because decodeURI is only permitted to decode some characters. However, I would expect consistency with window.location.pathname:

const myLocation = history.createLocation({ pathname: `/search/${encodeURIComponent('foo & bar')}` });

historyInstance.push(myLocation);

myLocation.pathname
// => "/search/foo %26 bar"

window.location.pathname
// => "/search/foo%20%26%20bar"

Even if most of the decoded characters are correctly re-encoded by the browser, this is still an issue when working with location.pathname (or location params in React Router) directly.

I think this is also called out in remix-run/react-router#5607.

@OliverJAsh
Copy link
Contributor

OliverJAsh commented Jan 26, 2018

For anyone who runs into this issue and is looking for workarounds…

On unsplash.com we are working around this by double URI encoding:

history.push({
  pathname: `randomlocation/${ encodeURIComponent(encodeURIComponent("AB%C")) }/furthermore`
});

This results in randomlocation/AB%25C/futhermore being pushed to the HTML5 history, which is as expected.

However, this has unintended side effects.

If a user navigates directly to randomlocation/AB%25C/futhermore, location.pathname will be randomlocation/AB%C/futhermore (decoded).

If we push randomlocation/AB%25C/futhermore to the user's history using the above workaround (as randomlocation/AB%2525C/furthermore), location.pathname will be randomlocation/AB%25C/futhermore (encoded).

(The same applies to location params.)

To workaround this side effect, we run a "safe decode":

const safeDecode = (pathname) => {
  try {
    return decodeURIComponent(pathname);
  } catch (_error) {
    return pathname;
  }
}

I'm sure this workaround has further implications…

I'd be very happy to help getting this issue fixed. Is there enough consensus for us to go with one of the proposed solutions?

@kevinliang
Copy link

@OliverJAsh Double encoding doesn't work because if the user hits the back button then the forward button, the error will throw again.

@OliverJAsh
Copy link
Contributor

@kevinliang What exactly would throw again? I tried what you said and I didn't get any exception.

@cephirothdy2j
Copy link

cephirothdy2j commented Oct 19, 2018

Here's how we're getting around the issue in our project:

  1. Switch your project to use Yarn if necessary.
  2. In package.json, add "history": "4.5.1" to your project's "dependencies".
  3. Also add "history": "4.5.1" to your "resolutions".
  4. You likely will have to encodeURIComponent and decodeURIComponent as you push entries to history and read params off of the URL.

Per the library's changelog, the encoding / decoding stuff was first added in v4.6.0, and v4.5.1 is the most recent version published without these changes.

By making history a direct dependency of our project and having any libraries that rely on it resolve to our selective dependency, we get around the encoding / decoding mess. Yes, we do miss out on the benefits this functionality provides - as well as any other upgrades and fixes published to history since 4.5.1 - but we were not using any of those other features anyways.

@Andrekra
Copy link

Andrekra commented Nov 21, 2018

For me it seems the decodeURI is the culprit.

I tried to visit an url with reserved characters (? # % / ) so I use encodeURIComponent to build the url. /the/path%23%25%3F%2F. But the decodeURI in createLocation, changes it to /the/path/%23%%3F%2F Which cant be decoded with decodeURIComponent or used re-encode with encodeURI in the hope to get the original string back.

// User
let pathname = encodeURIComponent('%?#') // %25%3F%23

// History
pathname = decodeURI(patname) // %%3F%23

// User Trying to get the ```%?#``` back
decodeURIComponent(pathname) // Uncaught URIError: URI malformed
// Another attempt, encode it back to get the original pathname
ecodeURI(pathname) // %25%253F%2523 Huh? Thats not my original string. Double encode the %

From random googling, people said the decodeURI/encodeURI should be used on full url's, while decodeURIComponent/encodeURIComponent on parts (component) of the url. The pathname is a component. More info here for the difference between them https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI#Description

I've wrote a few tests. If you change decodeURI to decodeURIComponent these will pass:

  describe('with a path that can be decoded', () => {
    it('handles encoded path', () => {
      expect(
        createLocation({pathname: '/the/path%23%25%3F%2F'})
      ).toMatchObject({
        pathname: '/the/path#%?/',
        search: '',
        hash: ''
      });
    });

    it('handles not encoded path', () => {
      expect(
        createLocation('/the/path?the=query#the-hash')
      ).toMatchObject({
        pathname: '/the/path',
        search: '?the=query',
        hash: '#the-hash'
      });
    });
  });

There are 3 (well 1 for 3 histories) tests failing, but I think that would be a false negative:

        Expected value to match object:
          {"pathname": "/view/%23abc"}
        Received:
          {"hash": "", "key": "rw3pt0", "pathname": "/view/#abc", "search": "", "state": undefined}

A problem would arise, when you decode %23 as an hash, and later on would re-interpertate the hash as a valid fragment (if this would be possible). Safest way would be to store the original path somewhere.

Edit:

I also discovered a second issue, related to this.
If you refresh on an URL with a safe character encoded, it's decoded (PUSH). If you use POP, the path stays as is.

const history = createBrowserHistory()
console.log('INIT:', history.location.pathname)
history.listen(location => {
  console.log(history.location.pathname)
})

// Refresh on page /the/path%23%25%3F%2F
=> INIT /the/path/%23%%3F%2F (POP, decoded, notice the double %)
// Go back and forth in in browser history
=>  /the/path/%23%25%3F%2F (PUSH, no decode)

This means I sometime have to decode twice, and sometimes once, if I use the double encode method described.

My current workaround:

const decode = str => {
  try {
    return decodeURIComponent(decodeURIComponent(str))
  } catch (e) {
    try {
      return decodeURIComponent(str)
    } catch {
      return str
    }
  }
}

const encode = str => encodeURIComponent(encodeURIComponent(str))

@Dalzhim
Copy link

Dalzhim commented Dec 21, 2018

Invoking decodeURI(location.pathname) on URLs before handling them to the browser's history is a major issue. First of all, it forces the browser to store malformed URLs (i.e. the original url https://mydomain.com/test%25 is modified to be https://mydomain.com/test%). Any browser vendor could disregard such blatant misuse of the API and completely break every user of this library when modifying their API to throw when pushing/replacing invalid URLs. They could also very well silently discard malformed URLs and never allow popping them back.

In this thread, workarounds were suggested as to decode everything except % signs. I would argue that it is a bad idea because you're going to surprise some of your indirect users who are going to have a hard time when their RR matcher doesn't work the same way once the % character is part of the URL (i.e. URLs are partially decoded except %25, which means you need to match against %25 rather than %).

Even though this represents a breaking change for people who came to rely on this implicit decoding, it is also a mandatory bug fix as applications relying directly or indirectly on this library are broken in regards to the % character and there is no reasonable workaround without fixing the root cause.

In conclusion, better figure out a path to make the breaking change at the time of your choosing rather than wait for a browser vendor to force you into it. In the mean time, the main Readme should list this as a known issue.

@Prophet32j
Copy link

We are ditching RR for @reach router because of this massive issue.

OliverJAsh added a commit to OliverJAsh/history that referenced this issue Jan 10, 2019
@OliverJAsh
Copy link
Contributor

OliverJAsh commented Jan 10, 2019

Trying to figure out where this issue now stands and how we can move forward.

I think you're right, @OliverJAsh. We should have left the pathname encoded and then only decoded it when we needed to make the match in React Router. This is the way previous versions of the router worked.

Now, the question is how to get there from here w/out breaking everyone's apps. This would be a breaking change, so we would need to push a new major version.

#505 (comment)

Plan:

  1. Raise PR to remove decoding from history
  2. Release a new version of history with this change
  3. Raise PR to upgrade history in react-router and handle decoding internally
  4. Release a new version of react-router with this change

I've started 1 in #656.

@kevinkyyro
Copy link

kevinkyyro commented Jan 22, 2019

As a matter of API design, if you want to allow users to provide decoded paths, you could accept a list instead of a string:

{
  pathname: '/search/a%2Fb%20testing/books'
}
// should be equivalent to something like
{ 
  pathname: ['search', 'a/b testing', 'books'] 
}

The relationship between each form should be approximately:

  • listPathname = stringPathname.replace(/^\/|\/$/g, '').split('/').map(decodeURIComponent)
  • stringPathname = '/' + listPathname.map(encodeURIComponent).join('/')

@OliverJAsh
Copy link
Contributor

Update on #505 (comment). We've been waiting for a few weeks now for feedback from a maintainer of this repo. No signs yet. :-(

In the meantime, I thought I'd share how we're working around this at Unsplash. We use patch-package, which lets us modify the contents of node_modules. Using this, we can remove the decoding. This has fixed all of our problems.

patch-package
--- a/node_modules/history/LocationUtils.js
+++ b/node_modules/history/LocationUtils.js
@@ -44,15 +44,9 @@ var createLocation = exports.createLocation = function createLocation(path, stat
     if (state !== undefined && location.state === undefined) location.state = state;
   }
 
-  try {
-    location.pathname = decodeURI(location.pathname);
-  } catch (e) {
-    if (e instanceof URIError) {
-      throw new URIError('Pathname "' + location.pathname + '" could not be decoded. ' + 'This is likely caused by an invalid percent-encoding.');
-    } else {
-      throw e;
-    }
-  }
+  // Waiting for upstream PR to be merged
+  // https://github.com/ReactTraining/history/issues/505#issuecomment-453175833
+  // https://github.com/ReactTraining/history/pull/656
 
   if (key) location.key = key;
 
--- a/node_modules/history/es/LocationUtils.js
+++ b/node_modules/history/es/LocationUtils.js
@@ -31,16 +31,6 @@ export var createLocation = function createLocation(path, state, key, currentLoc
     if (state !== undefined && location.state === undefined) location.state = state;
   }
 
-  try {
-    location.pathname = decodeURI(location.pathname);
-  } catch (e) {
-    if (e instanceof URIError) {
-      throw new URIError('Pathname "' + location.pathname + '" could not be decoded. ' + 'This is likely caused by an invalid percent-encoding.');
-    } else {
-      throw e;
-    }
-  }
-
   if (key) location.key = key;
 
   if (currentLocation) {
--- a/node_modules/history/umd/history.js
+++ b/node_modules/history/umd/history.js
@@ -157,16 +157,6 @@ return /******/ (function(modules) { // webpackBootstrap
 	    if (state !== undefined && location.state === undefined) location.state = state;
 	  }
 
-	  try {
-	    location.pathname = decodeURI(location.pathname);
-	  } catch (e) {
-	    if (e instanceof URIError) {
-	      throw new URIError('Pathname "' + location.pathname + '" could not be decoded. ' + 'This is likely caused by an invalid percent-encoding.');
-	    } else {
-	      throw e;
-	    }
-	  }
-
 	  if (key) location.key = key;
 
 	  if (currentLocation) {

spong added a commit to elastic/kibana that referenced this issue Mar 7, 2019
This PR covers the Routing + Page Skeleton of the Network IP Details page, including:
* New route `app/secops#/network/ip/:ipv4_or_ipv6`
* Empty page `ip_details` following convention from `host_details`
* Links to IP fields for following components:
  * Authentications Table
  * Events Table
  * Host Summary
  * Network Top N Flow Table
  * Timeline Table
* HoC `HostDetailsLink` and `IPDetailsLink` for easily linking to the `HostDetails` or `IP Details` pages
* Refactored `Breadcrumb` component and navigation logic using react-router's `withRouter`
* Added `encodeIpv6` & `decodeIpv6` helper functions (see note below)
* Added tests for:
  * HeaderBreadcrumbsComponent
  * Custom links `HostDetailsLink` and `IPDetailsLink`
  * ipv6 helper functions

Includes fixes for:
* https://github.com/elastic/ingest-dev/issues/310
* https://github.com/elastic/ingest-dev/issues/234

Note: Due to remix-run/history#505 and the way SecOp's react-router is nested within Kibana's native routing (via `link-to`), URL's that contain characters that require URL Encoding (like colons in ipv6) can break the back button. This occurs as react-router adds a history entry for the decoded URL, so when pressing the back button the user returns to the decoded `link-to` url, which in turns re-routes them back to the page they were on. Speaking with @XavierM, we decided an initial solution for this would be to replace the `:`'s in ipv6's with `-`'s. As referenced in the issue above, it looks as if there is potential for a more robust solution we could implement at the `link-to` layer of our application. In the meantime, our current solution maintains the functionality of the back button, and also allows the user to manually input ipv6's with `:`'s into the address bar without breaking navigation.

![host_ip](https://user-images.githubusercontent.com/2946766/53674424-18ee6080-3c4b-11e9-8149-cb4dd048ca70.gif)
@OliverJAsh
Copy link
Contributor

There's some discussion in #656 about what the correct behaviour is (as opposed to decoding pathname). Our options are:

  1. Just use whatever pathname is provided (i.e. leave it to the browser/user to encode).
  2. Try to normalize pathnames so that they are encoded. If the provided pathname is fully encoded, use as is, otherwise (if the pathname is not encoded at all, or some parts are not) encode the parts that are not yet encoded.

What do people here think? Can anyone foresee 2 causing issues?

With 2, I'm worried about bugs in the normalization process. This would be off loaded to encodeurl. Is it a completely safe operation? How likely is it to go wrong?

/cc @pshrmn

@bripkens
Copy link

bripkens commented Mar 25, 2019

Try to normalize pathnames so that they are encoded. If the provided pathname is fully encoded, use as is, otherwise (if the pathname is not encoded at all, or some parts are not) encode the parts that are not yet encoded.

IMHO this is impossible to implement. Let's say I have a path template like /article/:name with the name parameter having the value the best 10%. Since we are all fine developers we encode the value and build the full path. This results in:

$ '/article/' + encodeURIComponent('the best 10%')
'/article/the%20best%2010%25'

A generic piece of code cannot identify that this is already properly encoded. This is the whole reason why this issue here came up. encodeURI is already attempting such an implementation. For completeness:

$ encodeURI('/article/the%20best%2010%25')
/article/the%2520best%252010%2525

Just use whatever pathname is provided (i.e. leave it to the browser/user to encode).

The value is called pathname which translates in my head to path and thus to the way it is defined in the specification. IMHO it is fair to assume / require that developers properly escape when fiddling around with dynamic paths.

P.S. We are making heavy use of matrix parameters at Instana. Due to this bug we are operating a fork of history for the last 1.5 years which just removes the double encoding. It is working perfectly fine for us.

Screen Shot 2019-03-25 at 15 46 14

@pshrmn
Copy link
Contributor

pshrmn commented Mar 25, 2019

@bripkens encodeurl is designed to work with already/partially encoded strings.

With this sandbox you can see that it handles fully encoded, non-encoded, and partially encoded strings.

@Dalzhim
Copy link

Dalzhim commented Mar 25, 2019

@bripkens is right when he says it is impossible to implement. In order to correctly encode an URL, you must be able to prove the reverse transformation is always possible without any loss of information. Otherwise you might be causing prejudice to the application that expects to see the same URL that was given to the system originally.

Your sandbox demonstrates this impossibility. Three different source URLs are encoded to the same resulting URL. When you decode the final URL, you cannot possibly retrieve the correct source URL all of the time because you cannot guess the intent of the user. This will inevitably lead to subtle bugs that often go unnoticed.

One final example to demonstrate how encodeurl will screw up some URLs is shown in this sandbox I derived from yours. Assuming I create an article with the following title : %20 is a space %-encoded, the urlencode library sees it as being partially encoded when really, it is not encoded at all. The resulting encoded string is the following : %20%20is%20a%20space%20%25-encoded which then decodes to ** is a space %-encoded**, corrupting the title of my article by decoding the litteral %20 I originally used.

@pshrmn
Copy link
Contributor

pshrmn commented Mar 25, 2019

@Dalzhim I'll start out with your final example; please let me know if I am misinterpreting your point.

Let's start out by completely ignoring the encodeurl package.

If we use history.pushState to navigate to this /%20 is a space %-encoded, we end up with the following pathname:

history.pushState(null, "", "/%20 is a space %-encoded")
window.location.pathname === "/%20%20is%20a%20space%20%-encoded"

The browser does not encode percentage signs, so the initial %20 octet is used as provided. We also have an invalid URI because %-e is an invalid octet.

decodeURI(window.location.pathname) // this throws an URIError

To make the URI valid, we can pass it to an encodeURI call. This will encode %-en to %25-en as well as convert %20 to %2520%20. This will result in a pathname that can be decoded to the desired result.

history.pushState(null, "", encodeURI("/%20 is a space %-encoded"))
window.location.pathname === "/%2520%20is%20a%20space%20%25-encoded"
decodeURI(window.location.pathname) // "/%20 is a space %-encoded" CORRECT

Now, getting back to encodeurl, what happens if we wrap the pathnames in encodeurl calls?

history.pushState(null, "", encodeurl("/%20 is a space %-encoded"))
window.location.pathname === "/%20%20is%20a%20space%20%25-encoded"
decodeURI(window.location.pathname) // "/ is a space %-encoded" INCORRECT

history.pushState(null, "", encodeurl(encodeURI("/%20 is a space %-encoded")))
window.location.pathname === "/%2520%20is%20a%20space%20%25-encoded"
decodeURI(window.location.pathname) // "/%20 is a space %-encoded" CORRECT

In the first case, we still have the incorrect pathname, but it is no longer an invalid URI because encodeurl correctly encodes %-en to %25-en. The issue is that encodeurl interpets the initial %20 as a valid octet.

In the second case, we get the same (correct) result as the plain encodeURI result; wrapping an encodeURI call in an encodeurl call does not have any issues with double encoding.

encodeurl offers partial protection, but is not capable of determining the users intent. The conclusion that I draw from this is that the user should always encode percentage signs themselves.

@Dalzhim
Copy link

Dalzhim commented Mar 25, 2019

@pshrmn : Your previous message is 100% correct. The user is the only one who has enough information to disambiguate any malformed URL. The first case ended up with a valid, but meaningless URL whereas the second example displays proper construction of the URL by the user, which makes it possible for the library to handle in a generic way. In other words, decodeURI(encodeURI(url)) === url must always be true, otherwise the error is the user's responsibility.

@OliverJAsh
Copy link
Contributor

Someone who knows better than me should raise a PR to encodeurl to mention these caveats…

@kelly-tock
Copy link

Can I ask before this issue, maybe rr 3 and below, what were the isssues with how it treated path params and what not? I don’t remember tbh having an issue until 4.

mjackson pushed a commit that referenced this issue Mar 26, 2019
billyvg added a commit to getsentry/sentry that referenced this issue May 21, 2019
The `history` package automatically tries to decode the URI and will break if the URI has an unencoded `%`.
  This problem usually props up when users manually manipulate the URL since in-app navigation will properly handle encoding.

  This change will redirect the user to the current path and strip all query strings.

  See remix-run/history#505 - They merged a change that would make the `history` package not always attempt to decode the URI, but requires a major version bump (and corresponding updates to react-router).

  Fixes JAVASCRIPT-7W0
billyvg added a commit to getsentry/sentry that referenced this issue May 21, 2019
…#13306)

The `history` package automatically tries to decode the URI and will break if the URI has an unencoded `%`. This problem usually props up when users manually manipulate the URL since in-app navigation will properly handle encoding. Currently, when this happens, the user is greeted with a big empty page.

Instead, this change will redirect the user to the current path and strip all query strings so that we're able to at least load the Sentry app.

See remix-run/history#505 - They merged a change that would make the `history` package not always attempt to decode the URI, but requires a major version bump (and corresponding updates to react-router).

Fixes JAVASCRIPT-7W0
Fixes JAVASCRIPT-5XJ
@lock lock bot locked as resolved and limited conversation to collaborators May 25, 2019
forl pushed a commit to forl/history that referenced this issue Aug 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.