Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable automatic pathname encoding #461

Closed
nil4 opened this issue Apr 12, 2017 · 18 comments
Closed

Disable automatic pathname encoding #461

nil4 opened this issue Apr 12, 2017 · 18 comments

Comments

@nil4
Copy link

nil4 commented Apr 12, 2017

In #442 and #445 automatic encoding of pathname was added. This change breaks my application and I could not find a good workaround except reverting to history@4.5.0.

The app uses react-router and URLs that look like /view/:id, where ids can contains # and other special characters (e.g. a link to id #abc is encoded <Link to="/view/%23abc" />). With history@4.5.0 and earlier, everything worked fine; the app takes care to correctly encode the id and links work.

After upgrading to v4.6.0 and v4.6.1, when clicking the link /view/%23abc (or when using history.push on the same URL), the browser address bar unexpectedly shows /view/%2523abc. Note the double encoding of the # character.

Below is a standalone repro of the issue, comparing pushState with history@4.6.1 push, against window.location.pathname.

To run it, save the file locally, then npm i http-server && node node_modules/.bin/http-server -a 127.0.0.1 -p 8080.

<!doctype html>
<html>
<body>
   <script src="https://unpkg.com/history@4.6.1/umd/history.min.js"></script>
   <script>
        window.history.pushState(null, null, '/%23pushState');
        console.log('after pushState', window.location.pathname, window.location.pathname === '/%23pushState');

        var h = window.History.createBrowserHistory();
        h.push('/%23history.push');
        console.log('after history.push', window.location.pathname, window.location.pathname === '/%23history.push');
   </script>
</body>
</html>

In the console output, note the double encoding of # when using history.push:

after pushState /%23pushState true
after history.push /%2523history.push false

When the first <script> is changed to reference history@4.5.0 the behavior is correct:

after pushState /%23pushState true
after history.push /%23history.push true

Please reconsider the automatic encoding approach, as applications cannot work around it. It should be left to the application to encode the URL correctly before passing it to this library.

@mjackson
Copy link
Member

Ping @pshrmn who wrote #442 and #445.

@pshrmn
Copy link
Contributor

pshrmn commented Apr 20, 2017

The issue seems to only be when the developer wants to include reserved characters (; , / ? : @ & = + $) in the pathname.

I think that a raw mode in which history doesn't encode/decode the pathname could be useful here, but generally speaking it is better to have encoding/decoding built in. Without automatic encoding/decoding, React Router devs have to use the encoded version of a path in their <Route>s. That is a real pain to anyone that uses unicode characters in their routes.

To be honest, I'm having a hard time understanding why the hash symbol is being encoded in the example. Is it just so that the value can be access as params.id instead of as location.hash?

@nil4
Copy link
Author

nil4 commented Apr 20, 2017

@pshrmn the fact that the path encoding is automatic and cannot be disabled or worked around is unexpected and makes this library incompatible with the browser API which it abstracts. I hope the code above demonstrates this issue, and I think this is a bug, not a feature.

The larger issue is that this behavior blocks valid use cases in a higher-level library such as react-router. In my app I need to generate URLs matching /view/:id, where the value of :id can include reserved URL characters such as #.

ABC is one example of an :id value, and its URL should be /view/ABC.
But #XYZ is also a possible :id value, and its URL should be /view/%23XYZ.
Also JD/2017 is a possible :id value, and its URL should be /view/JD%2F2017.

To be crystal clear, I am not using using hashes at all. So how do I go about this? Here is what I tried:

<Link to="/view/#XYZ" /> 
<!-- Expected and actual --> <a href="/view/#XYZ" />

Looks fine, but I need the ID value in the path, not the hash.

<Link to="/view/%23XYZ" /> 
<!-- Expected --> <a href="/view/%23XYZ" />
<!-- Actual -->   <a href="/view/%2523XYZ" />

This is how a URL would be specified for pushState, but due to unconditional auto-encoding this library double-encodes.

Let's specify pathname explicitly:

<Link to={{pathname: '/view/#XYZ'}} />
<!-- Expected --> <a href="/view/%23XYZ" />
<!-- Actual -->   <a href="/view/#XYZ" />

Again, not the expected result.

<Link to={{pathname: '/view/%23XYZ'}} />
<!-- Expected --> <a href="/view/%23XYZ" />
<!-- Actual -->   <a href="/view/%2523XYZ" />

What am I missing?

@pshrmn
Copy link
Contributor

pshrmn commented Apr 20, 2017

When history did not encode/decode, then a user who wanted routes with non-ASCII characters would have to set the path to be the encoded string

// before
<Route path='/%E6%AD%B4%E5%8F%B2' ... />
// after
<Route path='/歴史' ... />

What I was proposing was that for people who want to handle encoding/decoding themselves, history would provide a raw prop.

const history = createBrowserHistory({ raw: true });

In raw mode, createPath would not encode the pathname and parsePath would not decode the pathname.

@pshrmn
Copy link
Contributor

pshrmn commented Apr 21, 2017

Actually, @nil4 you're really only concerned with the encoding, right? Is the automatic decoding breaking anything for you?

I don't actually think that createPath needs to encode the pathname because the browser will handle that for us.

@merrywhether
Copy link

@pshrmn I'm in the same boat as @nil4, and yes, automatic decoding is fine, it's the encoding that is causing me issues.

@mjackson
Copy link
Member

@pshrmn I think you're right. The only thing we need to be doing is decoding, not encoding. The browser will handle that for us (like it was before we made the change!)

@pshrmn
Copy link
Contributor

pshrmn commented Apr 21, 2017

Okay, so I tested out what happens if just encodeURI was removed and it almost works. The one issue that I came across is if the pathname includes a percent sign.

// no encoding
history.pushState(null, null, '/99% of the time')
// window.location.pathname === "/99%%20of%20the%20time"
decodeURI(window.location.pathname) // throws a URIError

// encoding
history.pushState(null, null, encodeURI('/99% of the time'))
// window.location.pathname === "/99%25%20of%20the%20time"
decodeURI(window.location.pathname) // "/99% of the time"

It makes sense that a percent sign is not auto-encoded (should %20 be left as is or encoded to %2520?), but it rules out being able to rely on the browser to correctly encode pathnames for us. Also, from looking into the spec, the URI passed to history.pushState should be encoded anyways, so we should be making sure that the pathname is encoded prior to pushing.

As far as possible solutions go, these include:

  1. Revert to how it used to be by removing encoding/decoding from history. This would force React Router and any other projects that use history to manually encode/decode pathnames (or pass that requirement on to their users).

  2. Add a raw prop to the history constructors. Then, modify createPath to not encode the location's pathname when it is in "raw" mode.

  3. Similar to 2, but add a raw argument to push/replace. This would allow for more fine-grained control of when history should not encode. The function signature would be a bit ugly because there is already an optional second argument to push/replace, but it could work.

history.push({ pathname: '/99% of the time' }) // "/99%25%20of%20the%20time"
history.push({ pathname: '/%20abc' }, null, true) // "/%20abc"

In theory, there should also be the ability to not decode the pathname. The only time that I have seen this requested had to do with a validation string made up of random characters, which could include percent signs. I'm sure that there are other use cases, though, so we should also consider what to do with that.

@mjackson
Copy link
Member

mjackson commented Apr 21, 2017

Correct me if I'm wrong but could another option be for us to just remove the encodeURI and throw if someone gives us a bad pathname (i.e. with a % that isn't part of an encoding)? Seems like that's what the browser should be doing anyway since it can't actually decode without throwing.

history.push('/99% of the time') // throws "Your URL cannot safely be decoded. Please encode it properly first"

EDIT: We can detect invalid URLs by passing them straight to decodeURI which will throw

@pshrmn
Copy link
Contributor

pshrmn commented Apr 21, 2017

@mjackson I've got a branch that I was putting together, but hadn't submitted because of the percent issue. It doesn't have the warning if the path is bad, but otherwise should be good.

@mjackson
Copy link
Member

But do you think that's a good solution, @pshrmn? To throw if they give us a bad path?

@pshrmn
Copy link
Contributor

pshrmn commented Apr 21, 2017

I'll need to give it some more thought, but I think that this makes sense.

99.9% (:smirk:) of the time, users aren't going to run into this issue. When they do, the solution is as simple as manually encoding with encodeURI.

<Link to={{ pathname: encodeURI('/60%+of+the+time') }}>By Odeon</Link>

This has me wondering now about what should happen when a history instance is created and the window.location.pathname is malformed. Right now they'll just end up with a blank page and an error message in the console, which seems non-ideal.

Edit: for example https://reacttraining.com/react-router/web/guides/quick-start%

@mjackson
Copy link
Member

That same concern was brought up in #448. I had thought it was resolved but maybe not.

If I try putting random % into well-known URLs around the web, everyone handles them differently!

Lots of sites are OK with the borked URLs, so I'm not 100% sure what the difference is. It may just be different libraries that tolerate the % better than others.

Maybe in these cases we just warn and leave the URL encoded? Isn't that what people would want most of the time?

@mjackson
Copy link
Member

In any case I should say that I think the idea to throw when someone tries to push an invalid URL seems sound given the fact that bad URLs cause lots of problems.

@richmeij
Copy link

richmeij commented May 1, 2017

I came across this same issue because a URL parameter I use in contains a forward slash, which I encode myself.

I upgraded to React Router v4 and new history packages, and now it gets double encoded. If I leave the forward slash untouched, it is not encoded at all, which messes up my URL parameters.

I think the 'raw' option proposed here will fix my issue, as I can keep encoding the URL myself. Or am I missing an obvious fix?

@pshrmn
Copy link
Contributor

pshrmn commented May 1, 2017

@richmeij Instead of raw mode, I think that it is actually better to just pass whatever pathname the user provides directly to the browser. You can see the implementation over in PR #465.

@richmeij
Copy link

richmeij commented May 2, 2017

Check. That would work. Thanks!

@mjackson
Copy link
Member

This is fixed in #465

@lock lock bot locked as resolved and limited conversation to collaborators Jun 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants