Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No distinction made between empty query string and undefined query string #293

Open
surfacepatterns opened this issue Mar 10, 2019 · 3 comments

Comments

@surfacepatterns
Copy link

The problem is illustrated by the following example:

>>> from yarl import URL
>>> URL('/stuff?')
URL('/stuff')
>>> URL('/stuff')
URL('/stuff')

RFC 3986 defines a query string using the following BNF:

query       = *( pchar / "/" / "?" )

The distinction between an undefined query and an empty query is based solely on the presence of the '?' delimiter. yarl doesn't make that distinction, which means that libraries that make use of yarl (e.g. aiohttp) may also not provide a way to make that distinction.

The lack of distinction can be problematic in cases where there's a dependency on the exact representation of the URL. For example, if an HTTP client sends a request and includes a cryptographic hash that takes the representation of the request target (path and query) into account, and the HTTP server sees a request target that does not include a '?' and attempts to authenticate the hash using the request target, then authentication will fail.

@aio-libs-bot
Copy link

GitMate.io thinks possibly related issues are #22 (Support non-UTF8 query strings), #91 (Encode semicolon in query string), and #181 (Encoding already encoded strings).

@asvetlov
Copy link
Member

I understand your pain but the problem is not easy.
yarl uses urllib.parse for parsing URLs.
The standard library doesn't distinguish empty query without query absence.
Theoretically, there is a possibility to replace urllib.parse.urlsplit with a custom implementation but it takes a long time.

Could you provide more info? Is it a client or server code?

@surfacepatterns
Copy link
Author

I'm writing an aiohttp client. Given that any URL that's sent to a aiohttp.ClientSession object makes use of a yarl.URL, it's impossible to handle this situation properly without a cheap, super hackish workaround like the following:

class _HackedUpQuery(str):

    def __bool__(self):
        return True
query_string = '' if query_string is None else _HackedUpQuery(query_string)
# pass to yarl.URL.build(), and pass the URL object to aiohttp.ClientSession

The hack makes me really nauseous, but it appears to work.

Note that I can work around the problem in aiohttp.Server using aiohttp.BaseRequest.raw_path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants