New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support delayed deny in the RBAC network filter #33771
Comments
cc @lambdai |
Also want to add that L4 RBAC is likely work with a TLS connection, which is cpu extensive (1ms cpu time) |
Wouldn't it be better to apply pressure earlier, e.g. by not reading bytes and starting TLS handshakes when there's a flood of connections? A delayed deny would mean Envoy has to maintain the memory structures for the connection when we'd want to shed them quickly. |
@kyessenov I think we will need both. The dealyed deny in RBAC is specific to connections to be closed due to permission error, and will be more effective to reduce the CPU usage on Envoy in some situations, for example, some gRPC clients retry in a busy for-loop when it is closed by RBAC, this creates signigicant number of new connections (e.g. 400 per second per client) on Envoy for more CPU usage (we are not really worried about the memory as it doesn't look to be an issue in either case). A delayed deny will naturaully reduce how fast the client is to retry, then siginificantly reduce the CPU usage since there is much less new connection being created/closed at the same time. |
SG, although this principle of delayed close should probably be applied uniformly: TLS handshake failures, protocol errors, etc all fails in the same error domain. It might be better to handle it at listener or the HCM level. CC @yanavlasov |
"When the RBAC policy evaluation result is DENY. The RBAC network filter will close the TCP connection immediately. This doesn't handle very well for some clients that just retry with a new connection at a very high rate, and that could overload the Envoy proxy to high CPU usage." 1.what's the real reason here for envoy proxy to high CPU usage ? Just for handler some clients DENY and close the tcp connection ? As you had said some clients doesn't handler well and just retry with new connection. So can we think this is client issue ? 2.From Envoy Pov, may be some protection policy or connection limit should be assigned to the same client not just only add connection close delay. |
@wufanqqfsc The TLS handshake is CPU intensive. Both client and server. Imagine you have a service behind envoy with huge fan-in. It is a client issue, but you don't nessessarily have the full control of the clients. |
Yes, so i mean may be some connection limit policy can be assigned to the same client to avoid same client retry with the connection & closed by envoy. Such as we can limit the connection frequency for the same client if the RBAC is failed during some time slots. Anyway, add delayed deny may help but also cost memory to keep the connection context. |
If specified, the RBAC network filter will delay the specified duration before actually closing the connection. This implements envoyproxy#33771. Signed-off-by: Yangmin Zhu <ymzhu@uber.com>
If specified, the RBAC network filter will delay the specified duration before actually closing the connection. This implements envoyproxy#33771. Signed-off-by: Yangmin Zhu <ymzhu@uber.com>
yeah, we can definitily enable different protections at multiple layer and places depending on the actual situation, it's not a one for all solution. The memory cost should be very small and the benefit (as compared to not having the delayed deny) is well worth it. |
Title: Support delayed deny in the RBAC network filter
Description:
When the RBAC policy evaluation result is DENY. The RBAC network filter will close the TCP connection immedidately. This doesn't handle very well for some clients that just retry with a new connection at a very high rate, and that could overload the Envoy proxy to high CPU usage.
We propose extending the RBAC network filter to delay a small amount of time (for example, 500ms, this will be configurable) before closing the TCP connection. For those clients this will limit the rate it retries with new connection.
This is especially useful for the RBAC network filter, because unlike the RBAC HTTP filter, it can only close the TCP connection and doesn't have a good way to propagate back the permission denial error (that is not retryable) to the client.
This is the same feature as the connection_limit that also implements the delayed rejecting functionality.
The following is a proposed API change to the RBAC network filter:
[optional Relevant Links:]
The text was updated successfully, but these errors were encountered: