Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with decode regexp #466

Closed
wants to merge 1 commit into from
Closed

Conversation

fnnzzz
Copy link

@fnnzzz fnnzzz commented Sep 12, 2018

Hi, I found a problem in case, when to me from the back-end comes a cookie query-string like, where a part of the value is encoded in Cyrillic and is presented in lower case.

2018-09-12 22 58 14

I could not understand why the get() method returns undefined.

A little debugged made the situation clear - RegExp used in decodeURIComponent matches only lowerCase parts, therefore inside decode() function is throw an error URI malformed (because of an incorrect string).

image
2018-09-12 22 58 27

So I send you my PR with a simple solution to this problem.

@mac2000
Copy link

mac2000 commented Sep 13, 2018

And where is test to catch regression in future?

@mac2000
Copy link

mac2000 commented Sep 13, 2018

Just to clarify problem exists in .net implementation of url encode - https://stackoverflow.com/questions/918019/net-urlencode-lowercase-problem

And RFC https://tools.ietf.org/html/rfc3986 says:

The uppercase hexadecimal digits 'A' through 'F' are equivalent to
the lowercase digits 'a' through 'f', respectively.

@FagnerMartinsBrack
Copy link
Member

FagnerMartinsBrack commented Sep 14, 2018

@mac2000 Thank you for the research!
@fnnzzz Thanks for the very clear report!

The RFC says:

For consistency, URI producers and
normalizers should use uppercase hexadecimal digits for all percent-
encodings

But it also says:

The uppercase hexadecimal digits 'A' through 'F' are equivalent to
the lowercase digits 'a' through 'f', respectively. If two URIs
differ only in the case of hexadecimal digits used in percent-encoded
octets, they are equivalent.

I can see a coupe of options:

  1. Accept lowercase percent-encoding data to make it possible for backend systems to integrate with the standard js-cookie encoding algorithm. We still need a test for it as @mac2000 pointed out and fix the tests that are failing the build.
  2. Do not accept lowercase percent-encoding data. Document somewhere that if you have a custom server, we recommend the user to use a converter instead and leave the standard js-cookie encoding/decoding algorithm only for compliant servers or to/from js-cookie interchange.

Be aware that option 1 might have other edge-case regressions.

Which one should we go for?
Ping @carhartl

@mac2000
Copy link

mac2000 commented Sep 14, 2018

@FagnerMartinsBrack thanks for your participation but I am kind of not agree with your point about relaying on undefined

Both decodeURIComponent('%D0%A5') and decodeURIComponent('%d0%a5') do the work - this is what should be expected

And if someone relaying on this - they are definitely doing something wrong

The right question is what should be done to regressions, but this is a place where only you guys can help

@mac2000
Copy link

mac2000 commented Sep 14, 2018

While writing this I did realize to my self that there is a chance to solve a problem in a following way - hexies are going by two pairs (at least for cyrillic) so may be we can figure out how to make regexp something like (%[0-9A-F]{2}){2}? (but still not sure if this is a case for everything)

@carhartl carhartl added this to the v3.0.0 milestone Sep 16, 2019
@carhartl carhartl self-assigned this Sep 19, 2019
carhartl added a commit that referenced this pull request Sep 23, 2019
Lowercase escapes are legal, `decodeURIComponent()` supports them, and
we should too.

The most difficult part is to know when we are dealing with hexadecimal
pairs and when not. Supporting only uppercase characters made
interpreting strings slightly more constrained, nevertheless even in
this case we might run into a scenario where we encounter strings that
look like encoded characters but aren't, for instance "%A1".

The solution to this is to no longer permit strings with mixed unencoded
and encoded characters, which seems like a rather rare edge case to
support. Either a cookie value is a fully url encoded string (which is
ensured when we're writing the cookie via our own api) or we treat it as
non-readable otherwise. In such fully encoded strings "%" would appear
as "%25", e.g. "%A1" would look like "%25A1". In case someone needs to
work with such mixed encoding cookie values, they would need to resort
to implementing their own specialized converter.

Closing #466
@carhartl
Copy link
Member

Fixed: 18ed0fd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants