-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fragmentation for h2 messages #1378
Comments
The issue is highly linked with #1379 , #488, h2-h2 proxying and #687 (comment). Now we have combined logic for constucting h2 responses and framing them. And current logic looks like:
This approach doesn't respect not only the remote peer My proposal: Prepare h2 message without frame headers inside Then the resulting message is forwarded into
Looking closely, it doesn't worth to make The work queue will run jobs not for separate skb lists but for entire connections. Since there are a lot of multiplexed streams inside one connection, With this change in mind, proxy mode of operation for h2 connection (without full assembling) will be very easy to implement: just push a new message into connection, and show it's incomplete. If a new message part is processed, lock it. Work queue will skip locked streams and proceed with ready ones. To sum up every thing, the full proposal is huge. But I can't see a good way to implement outgoing framing in the right way with cheaper costs. |
I have only objection that the proposed architecture doesn't account TCP HoL blocking (see #1196) and potentially racy: if we prepare frames on The point is that we can frame data on receive window update (for HTTP/2, TCP, and QUIC), just like the Linux skb coalescing works - if we do this asynchronously, then the things can be broken. We had the same problem with TLS records slicing and made a solution. |
I have reviewed the approach. It must be done in several stages without trying to beat everything at once. First stage - implement correct framing and remove chunked encoding for h2 connections. Second stage - h2 flow control. Third - prioritization. First stage: framingNo hard changes are required, but #1378 and #1379 will be fixed. Remove chunked encoding Chunked encoding can't be be forwarded to h2-receivers. It is possible to remove the encoding either on receive or on transmit path. Usually it make sense to postpone heavy processing as far as possible, but stripping chunked encoding on transmit path doesn't give any advantages. In the same time it worth to strip the chunked encoding just before saving response to cache. In this case we will postpone fragmentation operation until block or forward decision is made and no 'de-chunking' will be required for responses served from cache. To accomplish that we need a new function Currently we will need the function only for responses and we always know, that response will be used for http2 client. After full h2-proxy mode will be implemented the function can be used for stripping chunked encoding from requests before forwarding to h2 server connections.
After chunked encoding is stripped from the body, all we need to do - frame the body as usual for http/2 connections. Frame http/2 messages Message framing should happen in
Thus it's not enough to use skb list and flags as only arguments for Since previously we pushed cloned skbs or detached from message skbs to lower levels, there was no issues about life time: we could simply free skbs. No we have to implement reference counters inside Message framing will take place in At this stage we don't need to care about TLS frame sizes, mss and other things. Generally speaking, nothing in Tempesta's behaviour will be changed. The WorkQueue egress scheduler will have the same prioritisation: simple queue. Tests: current test infrastracture doesn't support h2 messages, h2spec is not designed for that, but we can use Time estimation: 1 week +- 1 day. Second stage: flow controlFlow control can be implemented independently from the framing (but previously I thought an other way). This is the greatest part of the job. With h2 we have THREE flow controls:
Note, that only DATA frames is subject for flow control. When a flow control for a single stream is exhausted, we can continue to other stream. When total flow control is exhausted, we have to wait for WINDOW_UPDATE frame. There was notice about race conditions for windows sizes. H2 specification allows receiver to push smaller window sizes at any time, but the receiver must be aware of that races and should receive frames of previous sizes. Since transmission happens from the same cpu, we don't need to care about synchronization here. We don't need to take care about TCP window size. It can be smaller or bigger than the h2 window size. It's generally OK: while TCP window size is used by kernels on sender and receiver to check that no data loss happen and no retransmission required, h2 flow control allows to synchronise performance and buffers between h2 applications. Here we have caveats. We need to switch between h2 streams for one connections and chose one against others. This only possible, when WorkQueue works not with single messages in queue manner, but with connections with egress stream queues. When the h2 receive buffers are exhausted, transmission must be paused until free space appears. (Responses without bodies can be sent out without any flow control). This is exactly #687 (comment) proposal. With this in mind the The biggest challenge: effective new WorkQueue implementation. Tests: h2spec has some tests, but i'm not sure this going to be enough. Also performance tests are required. Time estimation: 2+ weeks. Third stage: prioritizationOnce second stage is done, all we need - change list of streams ready to transmit to some other structure, that allow us to chose streams depending on their priority. It reimplements only one function from the previous stage, but it sophisticated enough to me split into separate stage. |
I agree that it makes sense to store HTTP responses in the cache with removed chunked encoding.
Why not to remove the encoding right during reading the response body and parsing the chunked encoding?
While on HTTP parsing stage in
Agree. The chunked encoding encodes chunks, i.e. it's quite likely that we have separate chunks in separate skbs, i.e. in the fast path code we can just move skb head and tail pointers to remove the chunked encoding (for the chunk length and trailing `\r\n' correspondingly) if we do this on reading stage.
#1125 even not scheduled for any of the releases. Also HTTP/1 client and HTTP/2 server is quite a corner case and in this case we can do the same - just remove chunked encoding for all requests and add it back if an upstream doesn't support HTTP/2.
Why not in context of
Why? I think we should care about TCP state. Suppose HTTP/2 allows to send 4KB frame. If TCP congestion window is say 2KB, then half of the HTTP/2 frame will stuck in our send queue. If TCP receive window announced by the peer is 2KB, then, the same way, the second half of the frame will be sitting in our send queue and waiting for the peer ACKs. See https://www.slideshare.net/kazuho/reorganizing-website-architecture-for-http2-and-beyond
We do need to carry about this, see above. The things will work, but clients will receive responses with large latency. Kazuho described the problem quite clearly.
Need to check the scenario carefully, but in my assumption at the moment we have following scenario:
The key point is whether Actually, the requirement from #1196
must be done in this issue. There is nothing in common between the prioritization and TCP HoL blocking actually.
Not necessary. This seems required change for #687 (comment) , but I don't see why we need to do this in context for the task.
We have separate #1196. For now we can send DATA frames consequently or in round-robin fashion. Also
is a good strategy for now to schedule DATA frames from different streams. |
The HLD was actively discussed in the chat and we agreed to postpone flow control implementation. The only thing required for now - fix http2 message framing to produce frames that doesn't overflow remote peer settings (See functions The proposed solution requires architecture change, but not all the desires (some of them was pretty old) was taken into account. Thus we preferred a quick fix now rather than big architecture rework. It seems like we don't need to implement flow control right away, since modern browsers doesn't use per-stream flow control and use pretty big receive windows as per-connection flow control limits. |
Tried to load tempesta-tech.com content with Chrome and Firefox several times with following configuration:
One time I saw empty page, but probably that was issues with cache in the browser. Next tries succeeded, but I saw following issues in dmesg:
Please check whether it's related to the #1400 pull request and, if not, create a separate issue for the problem. |
The first trace is not related to #1400, It also reproduces to me on current master, #1387 (comment) But the second is a new to me. And multiple connections close with real browser request parsing failure is also a new thing. I'll check that. |
Fixed by #1400 |
Scope
Amendment for #309 . The whole #309 is huge and general, so let's create a new issue for every new feature|bug request.
h2 requires framing. It's implemented already, but it's doesn't care about fragmentation. Remote peer will reject frames if they're bigger than remote peer allowed. By default maximum frame size is 16384 bytes, but client may enlarge that limit through SETTINGS frame.
When we forward messages to the client, we must honour that limits and fragment too big bodies and headers. I've added corresponding TODOs in #1374 PR
Since h2 is to be first-class citizen, we can store predefined response bodies in already fragmented and framed form. All clients must support frames of 16384 bytes and we don't need to adapt predefined response bodies for every clients.
Testing
Client must define different maximum frame sizes, responses mustn't use bigger frames. The following cases must be covered:
The text was updated successfully, but these errors were encountered: