Adding streaming response to AdminClusters endpoint #33879

miroswan · 2024-04-30T23:45:27Z

Commit Message: Adding streaming response to AdminClusters endpoint
Additional Description: Leveraging Envoy::Server::Request to implement a streaming response for AdminClusters
Risk Level: medium
Testing: unit / integration / manual
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
[Optional Runtime guard:]
[Optional Fixes #Issue]: #33879
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

adisuissa · 2024-05-01T12:41:53Z

Seems that this PR includes many file changes. As it doesn't have reviews yet, can you please rebase it?
/wait

adisuissa · 2024-05-01T12:45:09Z

Seems that this PR includes many file changes. As it doesn't have reviews yet, can you please rebase it?

Apologies, I misread the PR contents.
This is quite a big change. Can you please provide some background what is the motivation behind this, and consider breaking it to smaller PRs.

miroswan · 2024-05-01T17:04:18Z

Seems that this PR includes many file changes. As it doesn't have reviews yet, can you please rebase it?

Apologies, I misread the PR contents. This is quite a big change. Can you please provide some background what is the motivation behind this, and consider breaking it to smaller PRs.

I've updated the PR description to explain which issue this fixes. It was assigned to me by @jmarantz. The motivation for the change can be found in the issue, but the TL;DR is that a request that needs to construct a response with many clusters can add unnecessary memory pressure and by streaming the response, we can reduce it.
I don't think I'll be rebasing given the guidance here: https://github.com/envoyproxy/envoy/blob/main/CONTRIBUTING.md#submitting-a-pr. If my reviewer (@tonya11en) requests a rebase I'd be happy to do that.
I won't be breaking this up right away since I was asked to not do this until the code could be reviewed holistically.

adisuissa · 2024-05-01T17:41:03Z

Thanks for the info.
The rebasing request was an error on my part because of the big PR and I thought it wasn't intentional.

Assigning @jmarantz who may have the most context.
/assign @jmarantz

miroswan · 2024-05-01T21:36:50Z

There are some aspects to this PR that I believe should be known to folks reviewing:

There is an opportunity to reduce string formatting in the text processor by replacing calls to Buffer::Instance::add with Buffer::Instance::addFragments. We can decide to do this in this PR or reserve for a secondary PR.
I noticed that the JSON encoder used prior to this PR has omit empty behavior. That is to say, values that are false, 0, or an empty object are omitted. I've taken some precaution to replicate this behavior within this PR.
I am interested to know if there are any existing e2e tests or benchmarks against which I can verify the changes.
The unit tests herein verify the changes with more than one cluster to ensure that more than one call to Envoy::Server::Request::nextChunk works as expected.

jmarantz · 2024-05-01T22:27:53Z

You can do addFragments in this PR or a follow-up; if it adds risk it's fine to do a follow-up

I think you should add a new benchmark, in the style of test/server/admin/stats_handler_speed_test.cc.

I think there are probably some unit & integration tests already. A good way to find them is to inject garbage into your output, run all the tests and see which ones fail :)

However, you can also look at test/integration/integratin_admin_test.cc and the tests in test/server/admin/...

jmarantz · 2024-05-01T22:28:29Z

Oh, and you can add the new benchmark in a separate PR so we can see the before/after.

miroswan · 2024-05-01T23:18:20Z

@jmarantz I'll create a separate Issue/PR to create a benchmark for the AdminClusters handler. We can get get that merged first. I can rebase those changes into this branch and make sure that the performance is improved.

miroswan · 2024-05-01T23:22:28Z

@jmarantz Please assign #33918 to me when you have a moment. Thanks!

jmarantz

flushing comments for now.

source/server/admin/clusters_chunk_processor.cc

source/server/admin/admin.cc

source/server/admin/clusters_chunk_processor.cc

miroswan · 2024-05-08T17:45:25Z

flushing comments for now.

Addressed your comments. Thanks.

jmarantz

looking great!

Flushing some more comments. Will continue to review in parallel.

source/server/admin/clusters_chunk_processor.h

jmarantz · 2024-05-10T16:43:01Z

source/server/admin/clusters_chunk_processor.h

+  void render(std::reference_wrapper<const Upstream::Cluster> cluster, Buffer::Instance& response);
+  void drainBufferIntoResponse(Buffer::Instance& response);
+  void finalize(Buffer::Instance& response);
+  void addAddress(Json::Streamer::Map* raw_host_ptr, const Upstream::HostSharedPtr& host,


why are we passing by pointer here? In Envoy, non-const refs are preferred unless there's some special reason they should be pointers, in which case there should be comments.

So, the constructors for json maps and arrays in Envoy::Json::Streamer create unique pointers by the name Envoy::Json::Streamer::MapPtr and Envoy::Json::Streamer::ArrayPtr. Within the code, sometimes I need to pass the map or array to a private helper function to do extra work without terminating the array or map (termination happens during deconstruction). By passing the raw pointer, I can loan the pointer to the map or array to the helper function to do work without it owning it and prematurely terminating the the array or map with closing tokens.

I am open to recommendations; this just seemed like the most straight forward way to achieve this. If you think this is sensible I am happy to add comments to each of these private member functions that take the raw pointer.

source/server/admin/clusters_handler.cc

source/server/admin/clusters_params.cc

source/server/admin/clusters_request.cc

test/server/admin/clusters_params_test.cc