Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ULR parser percent encoding corrupting query params #2883

Closed
ttys0dev opened this issue Oct 7, 2023 · 10 comments · Fixed by #2990
Closed

ULR parser percent encoding corrupting query params #2883

ttys0dev opened this issue Oct 7, 2023 · 10 comments · Fixed by #2990

Comments

@ttys0dev
Copy link

ttys0dev commented Oct 7, 2023

It appears that the url parser is corrupting query params by incorrectly percent encoding them.

I noticed there's this comment which indicates this percent encoding does not accurately follow the spec.

For example if I have a URL with the following query parameter type=application/pdf it gets incorrectly encoded to type=application%2Fpdf which causes the server to throw an error.

This behavior does not match requests which doesn't corrupt the query params.

Possibly related: #2721 #2723 #2856 #2863 #2864

@kevinlul
Copy link

kevinlul commented Oct 21, 2023

I've found similar behaviour as well which may or may not be intended.

This URL (as a string) http://localhost:8000/api.php?gcmtitle=Category:Yu-Gi-Oh!_Master_Duel_Forbidden_%26_Limited_Lists passed to the client is not transformed.

This URL http://localhost:8000/api.php?gcmtitle=Category:Yu-Gi-Oh!_Master_Duel_Forbidden_%26_Limited_Lists&prop=revisions|categories gets a pass through the encoder because | is unencoded, so the client ends up making the request GET /api.php?gcmtitle=Category:Yu-Gi-Oh!_Master_Duel_Forbidden_%2526_Limited_Lists&prop=revisions%7Ccategories, double-encoding & that was already encoded as %26 in the provided string.

http://localhost:8000/api.php?gcmtitle=Category:Yu-Gi-Oh!_Master_Duel_Forbidden_%26_Limited_Lists&prop=revisions%7Ccategories is once again unaffected, making the request GET /api.php?gcmtitle=Category:Yu-Gi-Oh!_Master_Duel_Forbidden_%26_Limited_Lists&prop=revisions%7Ccategories.

@lloydzhou
Copy link

#2844 (reply in thread)

@dhensen
Copy link

dhensen commented Nov 9, 2023

What's the status? This is causing big problems for me. I need to have a temporary workaround. Is there a downgrade path? or workaround like marking a url as "raw" if you're sure it does not need further escaping? I'm on 0.25.1 and suffer from this badly when generating Dall-E images with OpenAI, these look like this:

https://oaidalleapiprodscus.blob.core.windows.net/private/org-xyxgxKHDKa9h2SyZk74VacjY/user-6CZ2Z6xmqMCEoabzlurm1utH/img-Lpem8k7ZbBvQ3A8zD8jlnPtb.png?st=2023-11-09T22%3A41%3A21Z&se=2023-11-10T00%3A41%3A21Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-11-09T19%3A51%3A33Z&ske=2023-11-10T19%3A51%3A33Z&sks=b&skv=2021-08-06&sig=OlpIVcF7OR%2Bd56ERijj6wSynejpy0O33vdF17ZM167I%3D

Then it goes through httpx, I'm not sure what it makes of it, but it's definitely returning 403. Is it being double encoded because there is a slash in image/png after the query string started?

@lloydzhou
Copy link

@dhensen
i got same error when download Dall-E images.

i create pr(#2929) to fix it.

@elupus
Copy link

elupus commented Dec 4, 2023

Encoding slash is not really a bug. But double encoding stuff already encoded is. It must decode before encoding in that case.

@elupus
Copy link

elupus commented Dec 4, 2023

@ttys0dev what server is this? It should url decode query parameters before checking them.

@ttys0dev
Copy link
Author

ttys0dev commented Dec 4, 2023

@ttys0dev what server is this?

I think apache or something, not something I control.

It should url decode query parameters before checking them.

Well...it doesn't....so we need a way to not url encode.

@elupus
Copy link

elupus commented Dec 4, 2023

@ttys0dev what server is this?

I think apache or something, not something I control.

It should url decode query parameters before checking them.

Well...it doesn't....so we need a way to not url encode.

Well apache will just forward it to the web application. So not the http server that decode. That said i also think we can revert parts of that change

@ttys0dev
Copy link
Author

ttys0dev commented Dec 4, 2023

Well apache will just forward it to the web application. So not the http server that decode.

I think it's some ancient perl based web application.

@Shulyaka
Copy link

Shulyaka commented Dec 4, 2023

@ttys0dev what server is this?

According to curl, it's Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0

Here is an example of a double encoded URL returned by dall-e:

>>> httpx.URL('https://oaidalleapiprodscus.blob.core.windows.net/private/org-LP5NjDb8vYKPpVtDZvsvnJzU/user-UEtOcH2nUvzzuecmNwOfvUg0/img-K2X7yJMMM2jrmbta14WUn3jS.png?st=2023-12-04T21%3A30%3A24Z&se=2023-12-04T23%3A30%3A24Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-04T05%3A46%3A14Z&ske=2023-12-05T05%3A46%3A14Z&sks=b&skv=2021-08-06&sig=Vcgw/ybHxYjADtWWSlFKkwNvXQKeRBCIvk33BAdzCTE%3D')
URL('https://oaidalleapiprodscus.blob.core.windows.net/private/org-LP5NjDb8vYKPpVtDZvsvnJzU/user-UEtOcH2nUvzzuecmNwOfvUg0/img-K2X7yJMMM2jrmbta14WUn3jS.png?st=2023-12-04T21%253A30%253A24Z&se=2023-12-04T23%253A30%253A24Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image%2Fpng&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-04T05%253A46%253A14Z&ske=2023-12-05T05%253A46%253A14Z&sks=b&skv=2021-08-06&sig=Vcgw%2FybHxYjADtWWSlFKkwNvXQKeRBCIvk33BAdzCTE%253D')

The correct URL works:

$ curl -v 'https://oaidalleapiprodscus.blob.core.windows.net/private/org-LP5NjDb8vYKPpVtDZvsvnJzU/user-UEt
OcH2nUvzzuecmNwOfvUg0/img-K2X7yJMMM2jrmbta14WUn3jS.png?st=2023-12-04T21%3A30%3A24Z&se=2023-12-04T23%3A30%3A24Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c
849652bcb3&skt=2023-12-04T05%3A46%3A14Z&ske=2023-12-05T05%3A46%3A14Z&sks=b&skv=2021-08-06&sig=Vcgw/ybHxYjADtWWSlFKkwNvXQKeRBCIvk33BAdzCTE%3D'
*   Trying 20.150.70.100:443...
* Connected to oaidalleapiprodscus.blob.core.windows.net (20.150.70.100) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: C=US; ST=WA; L=Redmond; O=Microsoft Corporation; CN=*.blob.core.windows.net
*  start date: Oct 29 21:11:57 2023 GMT
*  expire date: Jun 27 23:59:59 2024 GMT
*  subjectAltName: host "oaidalleapiprodscus.blob.core.windows.net" matched cert's "*.blob.core.windows.net"
*  issuer: C=US; O=Microsoft Corporation; CN=Microsoft Azure TLS Issuing CA 05
*  SSL certificate verify ok.
* using HTTP/1.x
> GET /private/org-LP5NjDb8vYKPpVtDZvsvnJzU/user-UEtOcH2nUvzzuecmNwOfvUg0/img-K2X7yJMMM2jrmbta14WUn3jS.png?st=2023-12-04T21%3A30%3A24Z&se=2023-12-04T23%3A30%3A24Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-04T05%3A46%3A14Z&ske=2023-12-05T05%3A46%3A14Z&sks=b&skv=2021-08-06&sig=Vcgw/ybHxYjADtWWSlFKkwNvXQKeRBCIvk33BAdzCTE%3D HTTP/1.1
> Host: oaidalleapiprodscus.blob.core.windows.net
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 3147861
< Content-Type: image/png
< Content-MD5: dBXv/x9P2sLLeXdxFqnDcQ==
< Last-Modified: Mon, 04 Dec 2023 22:30:24 GMT
< Accept-Ranges: bytes
< ETag: "0x8DBF5189BCB4C67"
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: c5e7db3d-601e-007d-1501-274c97000000
< x-ms-version: 2021-08-06
< x-ms-creation-time: Mon, 04 Dec 2023 22:30:24 GMT
< x-ms-lease-status: unlocked
< x-ms-lease-state: available
< x-ms-blob-type: BlockBlob
< Content-Disposition: inline
< x-ms-server-encrypted: true
< Date: Mon, 04 Dec 2023 22:32:51 GMT
<
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
* Failure writing output to destination
* Closing connection
* TLSv1.2 (OUT), TLS alert, close notify (256):

And the second one doesn't:

$ curl -v 'https://oaidalleapiprodscus.blob.core.windows.net/private/org-LP5NjDb8vYKPpVtDZvsvnJzU/user-UEt
OcH2nUvzzuecmNwOfvUg0/img-K2X7yJMMM2jrmbta14WUn3jS.png?st=2023-12-04T21%253A30%253A24Z&se=2023-12-04T23%253A30%253A24Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image%2Fpng&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-48
4e-a814-9c849652bcb3&skt=2023-12-04T05%253A46%253A14Z&ske=2023-12-05T05%253A46%253A14Z&sks=b&skv=2021-08-06&sig=Vcgw%2FybHxYjADtWWSlFKkwNvXQKeRBCIvk33BAdzCTE%253D'
*   Trying 20.150.70.100:443...
* Connected to oaidalleapiprodscus.blob.core.windows.net (20.150.70.100) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: C=US; ST=WA; L=Redmond; O=Microsoft Corporation; CN=*.blob.core.windows.net
*  start date: Oct 29 21:11:57 2023 GMT
*  expire date: Jun 27 23:59:59 2024 GMT
*  subjectAltName: host "oaidalleapiprodscus.blob.core.windows.net" matched cert's "*.blob.core.windows.net"
*  issuer: C=US; O=Microsoft Corporation; CN=Microsoft Azure TLS Issuing CA 05
*  SSL certificate verify ok.
* using HTTP/1.x
> GET /private/org-LP5NjDb8vYKPpVtDZvsvnJzU/user-UEtOcH2nUvzzuecmNwOfvUg0/img-K2X7yJMMM2jrmbta14WUn3jS.png?st=2023-12-04T21%253A30%253A24Z&se=2023-12-04T23%253A30%253A24Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image%2Fpng&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-04T05%253A46%253A14Z&ske=2023-12-05T05%253A46%253A14Z&sks=b&skv=2021-08-06&sig=Vcgw%2FybHxYjADtWWSlFKkwNvXQKeRBCIvk33BAdzCTE%253D HTTP/1.1
> Host: oaidalleapiprodscus.blob.core.windows.net
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
< Content-Length: 408
< Content-Type: application/xml
< Server: Microsoft-HTTPAPI/2.0
< x-ms-request-id: d2f2295c-f01e-008b-6f01-2739d9000000
< x-ms-error-code: AuthenticationFailed
< Date: Mon, 04 Dec 2023 22:33:16 GMT
<
<?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:d2f2295c-f01e-008b-6f01-2739d9000000
* Connection #0 to host oaidalleapiprodscus.blob.core.windows.net left intact
Time:2023-12-04T22:33:16.5185562Z</Message><AuthenticationErrorDetail>Signature fields not well formed.</AuthenticationErrorDetail></Error>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants