Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post Request Body must be read #558

Open
bdurrer opened this issue May 10, 2022 · 8 comments
Open

Post Request Body must be read #558

bdurrer opened this issue May 10, 2022 · 8 comments
Labels
area:core Issue with EmbedIO core (server, modules, etc.) breaking Requires one or more breaking changes. enhancement pinned Pinned issues are not closed automatically when stale. v3.x

Comments

@bdurrer
Copy link

bdurrer commented May 10, 2022

Describe the bug
It appears that C# HttpClient throws an exception when EmbedIO answers to an POST request without reading the sent request body first.

To Reproduce
Steps to reproduce the behavior:

  1. Take a look at the POC: https://github.com/bdurrer/embedio_post_test
  2. Start the server, then run the client
  3. the 4th request fails
  4. See error

Expected behavior
Maybe EmbedIo has to read the body if the user's code did not do that specifically. Note that HttpContext.GetRequestBodyAsStringAsync(); and its friends can be called even on requests that cannot have a body (e.g. GET), so a generic solution for all verbs should be possible

Setup:

  • OS: Windows
  • C# HttpClient on .netFramework 4.7.2
  • EmbedIO 3.4.3
@rdeago rdeago added enhancement v3.x area:core Issue with EmbedIO core (server, modules, etc.) pinned Pinned issues are not closed automatically when stale. breaking Requires one or more breaking changes. labels May 10, 2022
@rdeago
Copy link
Collaborator

rdeago commented May 10, 2022

Hello @bdurrer, nice to see you opened an issue here after we discussed this on Slack.

It makes sense (kind of) that HttpClient wants the request body to be read: for the connection to be reused for another request, it must obviously be flushed in both directions.

On Slack you also mentioned how Postman is not affected by this problem. I think this is because Postman sends a single request and does not care about reusing the connection.

Although a ReadAndDiscardRequestBody() method would work as a workaround, the real solution is automatically reading the request body (if not already read) before sending response headers.

Better yet, we could read the request body before even trying to route the request; that's what PHP does IIRC. Obviously PHP has a configurable limit on request body size, which we could implement too. PHP also automatically parses uploaded files (I think they are exempt from the body size limit btw) and stores them in a temporary directory; that, too, would be nice to have, and maybe even necessary.

The problem lies in the "before sending response headers" part, as we don't know when exactly HttpListener will send them, unless we block all direct access to HttpListener methods. This is obviously a breaking change.

To recap, there are two possible solutions:

  1. Always read the request body before processing the request; make the data available through the HTTP context object; have a configurable size limit for request bodies; parse and save uploaded files as they don't count against the limit; (plus probably more stuff I'm not considering at the moment);
  2. Read the request body on demand; don't make the listener's Request and Response objects available to applications, ever, in any form, so as to be in control of when response headers are sent; throw an exception if request body contents are requested after being read and discarded; etc.

I personally lean on the side of 1: more complicated but it looks more like a solution as opposed to a convoluted workaround.

Thoughts, anyone?
@geoperez @mariodivece @radioegor146 @madnik7 @rocketraman @bufferUnderrun

@radioegor146
Copy link
Contributor

My personal opinion is that we should not read the whole body.
In the case of a big amount of requests with the handler not reading the body (for example returning 403 based only on cookies) will consume large amounts of memory and CPU time. If I understood correctly - the library is posing itself as an "embedded solution", and I've used it for some IoT devices.
Also, not reading the full body is a great solution to attacks, as we will consume any excess CPU/RAM only if required.

@geoperez
Copy link
Member

Hello, I agreed with @radioegor146. Access to I/O should be minimal to avoid performance issues.

@rdeago
Copy link
Collaborator

rdeago commented May 11, 2022

@radioegor146 @geoperez thanks a lot for chiming in.

Reading on demand is in fact simpler, and not reading at all is even simpler than that 😄 but the question is: what do we do if there is a request body and no handler reads it?

Some quick proposals:

  1. Leave everything as is and mark this edge case as a probable bug in HttpClient.
  2. After the request is handled, check the request's Content-Length header (good luck for chunked requests, but this is an edge case after all) and if there appears to be data that has not been read, and the connection is bound to be reused, read and discard the request body. Add an option to WebServerBase for the size limit of request bodies to mitigate possible attacks.
  3. Instead of reading an unread request body, set the response's Connection header to close and do not reuse the connection. This will probably not solve the problem with HttpClient, but it could be a sensible thing to do anyway. The problem here is, the response headers at that time will be already sent in 99.9% of cases. (Not in EmbedIO 4.0 though.)
  4. ...any more, anyone?

Also, is anyone willing to research whether this HttpClient behavior is a bug (so we're discussing the gender of angels, as the Italian saying goes) or is dictated by some RFC? Unfortunately I'm very tight on time these days.

@geoperez
Copy link
Member

LOL, why are Italians discussing that? Anyway, number two could be a better approach. A mix between reading if the length is small (set a custom threshold) and closing the connection if the length is shady.

@rdeago
Copy link
Collaborator

rdeago commented May 11, 2022

So, number 2 with number 3 as a fallback. Looks good. 👍

<OffTopic>

LOL, why are Italians discussing that?

🤣🤣🤣 @geoperez ¡gracias por alegrarme el día!

To "discuss the gender of angels" in Italian means to discuss an idle, useless topic, wasting time that could be better spent.
(Italian Wikipedia, translated by Google)
Italians tend to do it a lot when commenting soccer matches. 😁

</OffTopic>

On second thought, however, there are no angels involved here. EmbedIO currently has no protection against the type of attacks mentioned by @radioegor146. A malicious client may send a 4Gb POST body to an EmbedIO server running on a Raspberry Pi, and we'll happily try to read it all and probably deserialize it too! 😱💥💥💥🙈

Since this is not the same problem reported by @bdurrer (although clearly related), should I open another issue @geoperez?

@radioegor146
Copy link
Contributor

I'm staying with the second option. I'm sure, that if there is an HTTP request, you can obviously get the size of the data (by reading it at least partially).
And for chunked requests, can't we just read chunk size, skip chunk bytes, and so on?
Like we can read only some parts of data that provide length.

@geoperez
Copy link
Member

@rdeago probably yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core Issue with EmbedIO core (server, modules, etc.) breaking Requires one or more breaking changes. enhancement pinned Pinned issues are not closed automatically when stale. v3.x
Projects
None yet
Development

No branches or pull requests

4 participants