Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Golang filter: provide a way to get all headers #29592

Open
spacewander opened this issue Sep 13, 2023 · 26 comments
Open

Golang filter: provide a way to get all headers #29592

spacewander opened this issue Sep 13, 2023 · 26 comments
Labels
area/golang enhancement Feature requests. Not bugs or questions.

Comments

@spacewander
Copy link
Contributor

Title: Golang filter: provide a way to get all headers

Description:

Describe the desired behavior, what scenario it enables and how it
would be used.

We can add a new method func (h *Header) GetAll(copy bool) map…. The method takes a boolean to indicate whether we should return a copy of internal headers. If the headers are not copied, it is the user's duty to ensure the headers won't be written later or accessed concurrently.

Here are the advantages:

  1. it is more effective than Range as we don't need to lock for long
  2. it is easier to write than Range if people want to dump all the headers
  3. Wasm provides a similar method so we can align to that implementation

If this method is added, we can retire the RangeWithCopy because if people are afraid of the deadlock, they can use GetAll with copy instead.

@spacewander spacewander added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Sep 13, 2023
@adisuissa adisuissa added area/golang and removed triage Issue requires triage labels Sep 13, 2023
@adisuissa
Copy link
Contributor

cc @doujiang24 @wangfakang @StarryVae golang's filter code-owners.

@doujiang24
Copy link
Member

I'm fine with GetAll + copy, but I'm not sure with nocopy, since nocopy will expose the internal type map[string][]string.

I was thinking of an optimization: maybe we could use a higher performance type instead of map[string][]string.
In the current map[string][]string, one unique header name, requires at least two GC pointers:

  1. One for the slice []string
  2. Another one for the string data of the value.

Compared to map[string]string, it requires one more GC pointer.

But, I haven't figured out a better way yet. So just a little worried about exposing the internal type to users.

@spacewander
Copy link
Contributor Author

map[string][]string is used as header because we follow the same type used by Go: https://github.com/golang/go/blob/495830acd6976c8a2b39dd4aa4fdc105ad72de52/src/net/http/header.go#L24.

If the value is string instead of []string, it will cause some trouble when handling repeated headers.

If nocopy is not needed, we can use Clone, which is the same as the http.Header type in the Go.

@doujiang24
Copy link
Member

yep, I agree that we should handle the repeated header correctly.
just a little worried about exposing the internal type to users, that will make it harder to optimize it in the feature.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 14, 2023
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 21, 2023
@willemveerman
Copy link
Contributor

I also think a method like GetAllHeaders would be useful.

I don't know much about C++ programming so apologies if the following paragraph is uninformed.

Presumably it would be faster to pass a data structure containing all the headers to the Golang plugin, rather than having the Golang plugin retrieve them from the HeaderMap using Range

Also, the issue with Range is that you iterate over all the headers, which is a waste of time because the special ones like scheme etc are already set in the header map as values - so it would be good to have a capability to get all the other headers

@phlax phlax removed the stale stalebot believes this issue/PR has not been touched recently label Apr 8, 2024
@phlax phlax reopened this Apr 8, 2024
@doujiang24
Copy link
Member

I also think a method like GetAllHeaders would be useful.

I don't know much about C++ programming so apologies if the following paragraph is uninformed.

Presumably it would be faster to pass a data structure containing all the headers to the Golang plugin, rather than having the Golang plugin retrieve them from the HeaderMap using Range

We cache the entire header map on the Golang side, in a single C++ Golang interchange.
Range doesn't mean getting values from C++ one by one, there won't be much performance difference.

Also, the issue with Range is that you iterate over all the headers, which is a waste of time because the special ones like scheme etc are already set in the header map as values - so it would be good to have a capability to get all the other headers

okay, you means all headers exclude the special ones like scheme, path and etc, right?

I think this might be a proper choice, it always do a copy for safety, thoughts?

func (h *Header) GetAll(skip_internal bool) map[string][]string

@willemveerman
Copy link
Contributor

willemveerman commented Apr 11, 2024

We cache the entire header map on the Golang side, in a single C++ Golang interchange.
Range doesn't mean getting values from C++ one by one, there won't be much performance difference.

OK, great - I wasn't aware of this.

okay, you means all headers exclude the special ones like scheme, path and etc, right?

Yes exactly, because the special ones are already available as attributes -

Scheme() string
Method() string
Host() string
Path() string

I think this might be a proper choice, it always do a copy for safety, thoughts?

That seems perfect to me. Will the looping occur on the golang side or the C++ side? Just thinking the optimal thing would be to do any sort of loop on the C++ side so the whole operation is as fast as possible.

@willemveerman
Copy link
Contributor

willemveerman commented Apr 11, 2024

Given the header map object is already cached, I wonder if the additional headers, i.e. the non-special ones, can't me made available in some sort of simple data structure in Golang - e.g. all_other_headers map[string][]string

This prevents the user of the filter needing to create and populate such a data structure themselves for every request

@doujiang24
Copy link
Member

Will the looping occur on the golang side or the C++ side? Just thinking the optimal thing would be to do any sort of loop on the C++ side so the whole operation is as fast as possible.

Golang side should be good enough, it's fast enough, no cgo call overhead.

This prevents the user of the filter needing to create and populate such a data structure themselves for every request

It's a trade-off, since we already have a map[string][]string cache for the whole headers.
This depends on how often the API will be used.

@willemveerman
Copy link
Contributor

It's a trade-off, since we already have a map[string][]string cache for the whole headers.
This depends on how often the API will be used.

Fair enough, is it possible to access the existing cache without creating another map[string][]string?

@doujiang24
Copy link
Member

Fair enough, is it possible to access the existing cache without creating another map[string][]string?

it could be done in the current Range api, but it contains the special headers.

@willemveerman
Copy link
Contributor

it could be done in the current Range api, but it contains the special headers.

Oh I see, I'm beginning to understand how it all works now.

So there's an existing map[string][]string under the hood, so to speak, and to access each element you can use Range

But the OP was proposing to write a func which would return a copy of the entire existing map[string][]string, is that right?

I think that would be a good idea because it would be much faster.

In some cases users will want to build other data structures using each header value as an attribute. With the existing Range approach we need to loop over every key, and also every value - so, in the the circumstance of repeated headers, a key will contain several values in the []string array and it will take time. However, with a func which returns map[string][]string we can pass the entire []string to the attribute without having to loop over every item in the array.

@willemveerman
Copy link
Contributor

Also, could the same method be added to the cluster_specifier_plugin so we can get all the headers, then route to a particular cluster based on the headers of each request.

@doujiang24
Copy link
Member

But the OP was proposing to write a func which would return a copy of the entire existing map[string][]string, is that right?

yep.

Also, could the same method be added to the cluster_specifier_plugin so we can get all the headers, then route to a particular cluster based on the headers of each request.

Do you really want all of the headers? maybe HeaderMap.Values(key string) []string works for your? to get the values of a specified key.

@willemveerman
Copy link
Contributor

willemveerman commented Apr 23, 2024

Do you really want all of the headers? maybe HeaderMap.Values(key string) []string works for your? to get the values of a specified key.

Well if it works like that I don't see the point of the plugin.

Let's say I use the golang http plugin to dynamically set a header, e.g. X-SPECIAL-HEADER: cluster-z, and then use the golang cluster_specifier plugin to get the value of that header and route to cluster-z

I can already use envoy's config to route to a cluster based on the value of X-SPECIAL-HEADER, so there's no point in using the golang cluster_specifier plugin because it will be slower than using native envoy config.

On the other hand, if I can use one plugin to read all the headers and then route to a particular cluster it's much faster; I can read the existing headers, decide on a cluster and then route to it, without needing to add X-SPECIAL-HEADER: cluster-z


Edit, I made a mistake. With Envoy's native config you can set a route based on a header value, but you can't dynamically route to any cluster based on a header value - which is something you can do with the golang cluster_specifier plugin.

So I guess the general idea is you can set a header in the golang http plugin, and then route to a cluster using that header with the golang cluster_specifier plugin.

Personally, I think it would be great if you could retrieve all headers in the golang cluster_specifier plugin because then the flow will be much faster. You can just read the headers, decide on a cluster, and route to it - without having to add a header or invoke two separate plugins in the request flow.

@doujiang24
Copy link
Member

but you can't dynamically route to any cluster based on a header value

cluster_header might be it: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction

I think it would be great if you could retrieve all headers in the golang

yep, GetAll could be better when you need all headers.
If you only need some specified headers with pre-known header names, Get/Values should be works too.

Anyway, there is no disagreements about adding the new API:

func (h *Header) GetAll(skip_internal bool) map[string][]string

Would you like to have a try? Seems it could be pure Golang implementation, no C++ code changes need.

@willemveerman
Copy link
Contributor

willemveerman commented Apr 26, 2024

cluster_header might be it: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction

Ah yes! Thank you.

So you can set a header, then route based on that header's value using Envoy's native config.

So, if the golang cluster_specifier can also access the underlying map[string][]string header cache then this would be the fastest mechanism for routing using a golang plugin, without needing to call another plugin:

  1. Read all headers by accessing HeaderMap map[string][]string
  2. Decide on a cluster
  3. Return chosen cluster name

Therefore, could 1. be added to golang cluster_specifier?

In the mean time I will try the golang implementation you mentioned in for the golang http plugin.

@doujiang24
Copy link
Member

Therefore, could 1. be added to golang cluster_specifier?

yep. it should be a common API for HeaderMap.

In the mean time I will try the golang implementation you mentioned in for the golang http plugin.

Cool!

@willemveerman
Copy link
Contributor

I thought about it and I realised that it probably doesn't make sense to exclude the internal/special headers. This is because the user will most likely want all the headers - the only reason to exclude them is that in the case of Range it would be faster to do so. But if we're passing an entire copy of the underlying map[string][]string there's no time penalty incurred if we pass all the headers.

The draft PR is here - do you have any thoughts? It's very simple. #33821

@willemveerman
Copy link
Contributor

cluster_header might be it: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction

Noticed that cluster_header actually doesn't work for this use-case, as detailed here: #33878

@doujiang24
Copy link
Member

the only reason to exclude them is that in the case of Range it would be faster to do so.

okay, it's really not a good idea to design API by the performance that you thought. As said previously, headers already cached in Golang side, there shouldn't be much performance diff.

The draft PR is here - do you have any thoughts? It's very simple. #33821

added some comments there.

@willemveerman
Copy link
Contributor

Therefore, could 1. be added to golang cluster_specifier?

yep. it should be a common API for HeaderMap.

I'm happy to make an attempt on the Golang side of this, but I'll need the c++ parts to be done by someone else because I don't have experience with it

@doujiang24
Copy link
Member

I'm happy to make an attempt on the Golang side of this, but I'll need the c++ parts to be done by someone else because I don't have experience with it

maybe you can copy some code from golang http plugin, I'm happy to help if you need it.

@willemveerman
Copy link
Contributor

I'm happy to make an attempt on the Golang side of this, but I'll need the c++ parts to be done by someone else because I don't have experience with it

maybe you can copy some code from golang http plugin, I'm happy to help if you need it.

Yes I thought I might be able to do that.. OK, I'll give that a go in another PR and then tag you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/golang enhancement Feature requests. Not bugs or questions.
Projects
None yet
Development

No branches or pull requests

5 participants