Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not set query parameter when forms are set #111

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

h-otter
Copy link

@h-otter h-otter commented Jun 9, 2022

Thank you for the useful project!

This PR changes to strictly determine whether to pass form values ​​or query parameters to the backend.

In the current behavior, when prom-label-proxy receives form value match[]=up&namespace=default as POST /api/v1/series, prom-label-proxy sends request to the backend with query parameter match[]= and form values match[]={__name__="up",namespace="default"}. At that time, the backend returns the result of match[]= which is set as the query parameter. Since we are using VictriaMetrics as a backend, the behavior in which query parameters take precedence over form values ​​may be due to the implementation of VictoriaMetrics.

@h-otter
Copy link
Author

h-otter commented Jul 7, 2022

Would you please take a look at this pull request?

@squat
Copy link
Member

squat commented Jul 7, 2022

I raised some related concerns a couple of years ago related to how we were not strictly enforcing both query parameters and form values. I think that anything that makes our proxy more strict in this regard (like this PR) will not only improve the predictability and compatibility but also the security.

@h-otter
Copy link
Author

h-otter commented Jul 22, 2022

Is it possible to be reviewed or get some advice about this, please?

@dupondje
Copy link

Was having an issue yesterday with prom-label-proxy which was caused by this.
It injected the match[]= into the GET arguments, but without the original values.
And also injected it into the POST form correctly. But the proxy did only pass the GET value to the backend, which caused incorrect queries to the backend.

This should be merged asap!

Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a serious regression in the enforcement code and is not what we want.

Currently, we always enforce labels on potential queries in the query portion of the URL because any request, whether or not it includes a form, can have a query in the query part of the URL. Then, if the request is a POST request, we will additionally enforce labels in the form. This ensures that all queries present in any part of the request will be protected.

This PR effectively allows a potential attacker to break this last point and enables an attacker to send a request with two queries: one malicious query in the query part of the URL and one dummy query in the request body; labels will only be enforced in the form body and will not be enforced in the query portion of the URL. Which query is actually parsed and processed by the API supplying metrics is entirely dependent on the language specification. Will the API read the non-enforced query from the URL or will it read the enforced query from the request body? There is no way to be certain.

This behavior was fixed in the prom-label-proxy two years ago and we should not regress on it.

TL;DR: we do not want to allow any queries that could potentially be read and processed by a backend API to pass without enforcement.

@dupondje
Copy link

@squat: I agree indeed! But it's more complicated.

Take you have a POST request with the following body:
match[]=up&namespace=default

With the current code this seems to generate the following url parameter:
match[]={namespace="default"}

And the correct POST body:
match[]={__name__="up",namespace="default"}

And finally the Simple Proxy in Golang seems to drop the POST body and use the url parameter.
Which causes the query to break!

I think we should somehow block match url parameters if it's a POST request?

@h-otter
Copy link
Author

h-otter commented Nov 21, 2022

This behavior was fixed in the prom-label-proxy two years ago and we should not regress on it.

I read the discussion in #53. I understand the security risk that not injected requests are sent to the backend. I agree that both query parameters and form values must be modified by injectMatcher.

However, in a system like VictoriaMetrics, where query parameters are evaluated first, there is an issue with unexpected responses. I consider that values to be sent to backends should be consistent. Either the requested query parameter or the form value is modified by injectMatcher, then is sent as both query parameter and form value to the backend. We can decide which of the requested values to adopt under the following conditions.

  • If it is a POST request and Content-Type is application/x-www-form-urlencoded, send requested form values.
  • In other cases, send requested query values.

How about this idea?

@dupondje
Copy link

@h-otter: If its a POST request its logic indeed to send all matchers via the Form Values.
But if there is a matcher send via a query param, then it should be injected also.
Now we just always inject a query param but without the original matcher appended to it, which break things.

@squat
Copy link
Member

squat commented Nov 21, 2022

I think we should somehow block match url parameters if it's a POST request?

Yes, this would be a potential solution: we chose one data transfer format to prioritize, and if a query is detected there, then we eliminate queries from other transfer formats.

In other words, we could say "forms take priority over query parameters" and then if we ever detect a form then we remove all query parameters that could include a Prometheus query.

@h-otter
Copy link
Author

h-otter commented Feb 15, 2023

I tried to reflect the results of the discussion in the code.
Does that make sense?

@h-otter
Copy link
Author

h-otter commented Oct 24, 2023

@simonpasquier I remember this PR.
Can I have your review to merge this issue? :)

I'm sorry for my poor English.

Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @h-otter I think this is not exactly what we need. Firstly, form data can be sent in a body with two common content types, which Golang supports and automatically parses: application/x-www-form-urlencoded and multipart/form-data. Secondly form data can be sent in bodies with many different methods: POST, PUT, and PATCH.
To move this PR forward, I would suggest handling these cases. Otherwise we are opening ourselves up to real.vulnerabilities where we only inject labels (which projects use to separate tenant data) in queries when they should be protecting forms.

@h-otter
Copy link
Author

h-otter commented Oct 25, 2023

Sorry, I can't figure out how to fix it. 😢

What are real vulnerabilities?
The risk of vulnerability is that prom-label-proxy sends a promql query with no matcher injected, isn't it?
I think that pattern can be solved by sending the same content to query and form data.

In that case, I need to decide what content type is set, from the following selections:

  1. keep the content type
  2. strict content type, such as application/x-www-form-urlencoded

The current patch keeps the content type.
Also, prometheus docs says:

You can URL-encode these parameters directly in the request body by using the POST method and Content-Type: application/x-www-form-urlencoded header.

So, the current patch sends form data if POST method and Content-Type: application/x-www-form-urlencoded.
However, I think there is no problem in sending form data even if any method and any content type.

How can I fix this PR, or are there any other problems? 😃

@squat
Copy link
Member

squat commented Oct 25, 2023

xref: observatorium/api#595

Here is an example of how easy it is to introduce vulnerabilities that allow tenants isolated by this proxy to view one another's data. Observatorium's isolation is built using prom-label-proxy code. The vulnerability they experience is something that we wanted to account for by always enforcing matchers no matter where they are, query or form

@h-otter
Copy link
Author

h-otter commented Oct 25, 2023

First of all, if this PR is having a negative impact on the development of prom-label-proxy, please feel free to close it.

The behavior of observatorium/api#595 looks like my first commit c118800.
I believe that this PR solves these vulnerability problems, not open new problems.
The problem is that an invalid matcher (injected empty matcher) is set to query param if inbounding form data.

The difference is that matchers of the inbound query params and inbound form data are merged.
Should I implement to merge these?
In my opinion, the merged matcher is an incorrect promql query for everyone, because clients who struct query params and form data should be different.
So there is no need to merge it, but it is not a strong opinion.

@douglascamata
Copy link

I think we should be enforcing on both query params and form data, if both are present. Then it's up to the servers to decide which one they give priority to (i.e. Thanos prefers form data and VictoriaMetrics prefers query params).

I'm open to working on the fix here, as long we agree on the solution.

@h-otter
Copy link
Author

h-otter commented Oct 25, 2023

I think we should be enforcing on both query params and form data, if both are present.

In that case, we can choose not to insert anything with injectMatcher when there is no match[]. (revert all and delete this line https://github.com/prometheus-community/prom-label-proxy/blob/main/injectproxy/routes.go#L571 )
Are there any problems with such a change?

@squat
Copy link
Member

squat commented Oct 25, 2023

I think we should be enforcing on both query params and form data, if both are present. Then it's up to the servers to decide which one they give priority to (i.e. Thanos prefers form data and VictoriaMetrics prefers query params).

Agreed. This was exactly the approach we took 2+ years ago when I made this PR: #53

In the PR, I mention that:

POST requests can send two queries, one in the POST body and another in the URL query string; the Prometheus API's implementation implicitly chooses the query from the POST body, but relying on this implicit behavior could lead to a potential vulnerability if the behavior ever changes so the code today overrides queries in both locations independently

@squat
Copy link
Member

squat commented Oct 25, 2023

@h-otter I like your proposal:

In that case, we can choose not to insert anything with injectMatcher when there is no match[]. (revert all and delete this line https://github.com/prometheus-community/prom-label-proxy/blob/main/injectproxy/routes.go#L571 )
Are there any problems with such a change?

Let's do exactly that; keep the enforcement logic as is to always enforce with the caveat that if there is no match[] then don't do anything. If we have that change then I think all of my concerns for security and your concerns for interoperability are satisfied.

@douglascamata
Copy link

douglascamata commented Oct 25, 2023

@squat @h-otter isn't this proposal a bit dangerous? If we do nothing when there's no match[], then one uses the Prometheus client-golang lib and do a call like the ones below we would effectively ignore the enforcement:

_, _, _ := v1.NewAPI(apiTest).LabelNames(context.TODO(), []string{""}, now.Time().Add(-5*time.Minute), now.Time())
_, _, _ := v1.NewAPI(apiTest).LabelNames(context.TODO(), []string{}, now.Time().Add(-5*time.Minute), now.Time())
_, _, _ := v1.NewAPI(apiTest).LabelNames(context.TODO(), nil, now.Time().Add(-5*time.Minute), now.Time())

@douglascamata
Copy link

These 3 cases are even more problematic because parser.ParseExpr will produce an error. But, according to Prometheus' docs, match[] is only required in the series call.

So in the label names and values calls, we need to inject the matcher when if match[] is not present.

@h-otter
Copy link
Author

h-otter commented Oct 26, 2023

These 3 cases are even more problematic because parser.ParseExpr will produce an error. But, according to Prometheus' docs, match[] is only required in the series call.

So in the label names and values calls, we need to inject the matcher when if match[] is not present.

That's definitely a problem. There is a workaround to not inject based on match[] only for the series endpoint. However, I don't think it's an essential solution, because endpoints other than series do not resolve the issue.

The challenge is that prom-label-proxy doesn't know whether the user's request is stored in form data, query param, or both. There are these choices:

  1. inject matchers into both anytime (main branch)
    • good: not dangerous
    • bad: preferred query params system (like VictriaMetrics) does not work
  2. inject matchers into both anytime with merged values (similar to Fix enforcement of query matchers for series, label_values, and label_names calls observatorium/api#595)
    • good: consistent
    • bad: large query cannot handle with application/x-www-form-urlencoded
  3. inject matchers if match[] is set (do not set query parameter when forms are set #111 (comment))
    • good: clearly to implement
    • bad: dangerous at not series endpoints
  4. inject matchers according to the request is POST method and Content-Type: application/x-www-form-urlencoded (current patch)
    • good: clearly to implement
    • bad: cannot enforce on both query params and form data
  5. inject matcher if any params set
    • good: seems to best meet the demand
    • bad: hacky
    • example)
func injectMatcher(q url.Values, matcher *labels.Matcher) error {
        if len(q) == 0 || (len(q) == 1 && q.Has(matcher.Name)) {
            return nil
        }

	matchers := q[matchersParam]
	if len(matchers) == 0 {
		q.Set(matchersParam, matchersToString(matcher))
		return nil
	}

Which would you choose? I think 5th choice is better.

@douglascamata
Copy link

I think option 5 is the best indeed. I almost misunderstood what you meant because of the writing inject matcher if any params set, which to me means if len(params) > 0 { inject }, but reading through the code example you gave I understand it correctly. In fact, this is how my PR is currently solving it in Observatorium.

@douglascamata
Copy link

douglascamata commented Oct 26, 2023

At route level though, we might have to go with option 1 because of the fact that different backends will prefer query params and others will prefer form params. If we want to have a generic middleware for everyone, we need to update both.

On this regard, how this is bad for VictoriaMetrics? Injecting the matcher in both places won't cause any problem, no? VM prefers query params, so it ignores the one added in the form data and that's it. Can you give an example scenario of the trouble it could cause?

But this "generic" injectMatcher function is the ideal one.

@h-otter
Copy link
Author

h-otter commented Oct 26, 2023

but reading through the code example you gave I understand it correctly.

I didn't have the confidence to express it in words, so I wrote the code as well. I'm glad you understood correctly. 😄

On this regard, how this is bad for VictoriaMetrics?

#134 details our problem. My example scenario has occurred on a set of prom-label-proxy, VictoriaMetrics, and Grafana. Grafana sends match[] value in request body with Content-Type: application/x-www-form-urlencoded like:

POST /api/v1/series?tenant_id=test-tenant HTTP/1.1
Content-Type: application/x-www-form-urlencoded

match[]={__name__="up"}

Then prom-label-proxy sends a request to VictriaMetrics with injected match[] to both query params and form data. The query value is a result of injected to not match[]={__name__="up"}, but empty (equals match[]={}). The request will be:

POST /api/v1/series?match%5B%5D=%7Btenant_id%3D%22test-tenant%22%7D HTTP/1.1
Content-Type: application/x-www-form-urlencoded

match[]={__name__="up",tenant_id="test-tenant"}

As a result, since VictoriaMetrics prefers query params, VictoriaMetrics returns the result of match[]={tenant_id="test-tenant"}.

This behavior caused heavy performance issues, especially when sending a request to the labels endpoint. To assist the user, Grafana makes a request to the labels endpoint and displays label suggestions. At that time, VictoriaMetrics makes a response of match[]={injected-label="injected-value"}, which is not filtered enough, so it takes a very long time. This causes to stuck Grafana pages.

@douglascamata
Copy link

@h-otter I guess the route level enforcer has to be configurable then. There could be a setting that specifies the merge logic whenever a parameter is find in both query and form body params. Options could be a bitmask representing:

  • merge or not.
  • favor query params or form body.

Examples:

  • merge|favor_query: when merge query and form params are found, merge them. Query has priority for matching keys.
  • favor_query: when merge query and body params are found, ignore form.

Analogous for favoring form params.

For default value, I will suggest merge|favor_form, as this is the behavior of every other tool I know except for VictoriaMetrics. No strong opinion on it though.

What do you think, @h-otter?

@h-otter
Copy link
Author

h-otter commented Oct 26, 2023

I can't figure out what the route level enforcer is. Where will the route level enforcer be configured, the options are set as prom-label-proxy flags on the starting process, header or query params in user's requests, or hard-code into these lines?

mux.Handle("/federate", r.el.ExtractLabel(enforceMethods(r.matcher, "GET"))),
mux.Handle("/api/v1/query", r.el.ExtractLabel(enforceMethods(r.query, "GET", "POST"))),
mux.Handle("/api/v1/query_range", r.el.ExtractLabel(enforceMethods(r.query, "GET", "POST"))),
mux.Handle("/api/v1/alerts", r.el.ExtractLabel(enforceMethods(r.passthrough, "GET"))),
mux.Handle("/api/v1/rules", r.el.ExtractLabel(enforceMethods(r.passthrough, "GET"))),
mux.Handle("/api/v1/series", r.el.ExtractLabel(enforceMethods(r.matcher, "GET", "POST"))),
mux.Handle("/api/v1/query_exemplars", r.el.ExtractLabel(enforceMethods(r.query, "GET", "POST"))),

Your merge logic and options sound good because they cover all of the use cases to merge query params and form data. 👍

In my opinion, since prom-label-proxy is a component to enforce labels in a given PromQL, prom-label-proxy should not modify the user's request as much as possible. I don't know the behavior of other tools, although it is better if the upstream service decides whether to merge the query and form data or not. The 5th option is more precise in maintaining users' requests. An idea is to create a new component (something called prom-merge-proxy) to merge query params and form data.

@douglascamata
Copy link

douglascamata commented Oct 26, 2023

In my opinion, since prom-label-proxy is a component to enforce labels in a given PromQL, prom-label-proxy should not modify the user's request as much as possible. I don't know the behavior of other tools, although it is better if the upstream service decides whether to merge the query and form data or not. The 5th option is more precise in maintaining users' requests. An idea is to create a new component (something called prom-merge-proxy) to merge query params and form data.

I disagree. I think it's fine that prom-label-proxy is taught how it should modify the request to do what it's meant to do (enforce labels).

The issue is that different PromQL backends diverged on how they handle query params and form params. There are people out there building platforms on top of prom-label-proxy and these PromQL backends for internal and external customers. These customers are being able to punch through the enforcement of prom-label-proxy because of this logic divergence in the backends. The only way to make prom-label-proxy safe is to tell it what to do depending on the backend you run.

I heavily disagree that we need yet another proxy for this. It's one more thing to maintain, run, scan for CVEs, update, monitor, alert on, etc etc.

But I'm just someone from another project trying to upstream a fix. So I'm keen to listen to what maintainers and contributors believe this project should do and what should be another project. 😄

@douglascamata
Copy link

Where will the route level enforcer be configured, the options are set as prom-label-proxy flags on the starting process, header or query params in user's requests, or hard-code into these lines?

I think they could be cli flags on the prom-label-proxy binary and then passed down to the http handlers.

@h-otter
Copy link
Author

h-otter commented Oct 27, 2023

But I'm just someone from another project trying to upstream a fix. So I'm keen to listen to what maintainers and contributors believe this project should do and what should be another project. 😄

I think so, too. I don't have a strong opinion about merge either. I follow the maintainer's decision.

@squat, which option do you think is better in #111 (comment) ? And, configurable of the route level enforcer should be implemented?

@h-otter
Copy link
Author

h-otter commented Oct 27, 2023

The issue is that different PromQL backends diverged on how they handle query params and form params. There are people out there building platforms on top of prom-label-proxy and these PromQL backends for internal and external customers. These customers are being able to punch through the enforcement of prom-label-proxy because of this logic divergence in the backends. The only way to make prom-label-proxy safe is to tell it what to do depending on the backend you run.

I agree that in a multi-tenant environment, there is a risk of unmerged results compromising other tenants. On the other hand, modifying user requests risks causing unexpected behavior when an unknown upstream like VictoriaMetrics appears. That is a trade-off. Your proposal realizes that we can configure that trade-off, so it sounds good. However, I think it would be better to discuss this in another issue.

@squat
Copy link
Member

squat commented Oct 27, 2023

What is the benefit to making the prioritization logic configurable? Prometheus and Victoria metrics and Thanos all (implicitly) make decisions as to what to do when an ambiguous request arrives. Prometheus and Thanos implicitly do merge|favor_form and VictoriaMetrics either does favor_query or merge|favor_query. I think that this project could also simply make a final decision about how to handle that ambiguity and document how the proxy works and call it a day. IMO that solves everything and reduces complexity in the project:

  1. It's predictable
  2. It plays nicely with @h-otter's issue: if the proxy does merge|favor_form then we would not run into the issue of an underspecified query that returns too many results because clearly a user wrote that query.
  3. It's easy to implement
  4. It's secure: there will only ever be matchers in the query parameter or the form and the matchers will be enforced.

I vote for picking one behavior and applying it always, whether it's option 5 or something like merge|favor_form. Adding more configurability to this behavior does not seem worth the effort to me. Who will benefit from this? The only use case that I can imagine would benefit from additional configurability is a user who has a faulty stack that introduces queries in two places and prefers to configure the proxy rather than fix the rogue component introducing an extra query, right?

Signed-off-by: h-otter <kawahara_hiroki@cyberagent.co.jp>
Signed-off-by: h-otter <kawahara_hiroki@cyberagent.co.jp>
Signed-off-by: h-otter <kawahara_hiroki@cyberagent.co.jp>
Signed-off-by: h-otter <kawahara_hiroki@cyberagent.co.jp>
@h-otter
Copy link
Author

h-otter commented Oct 29, 2023

Thank you for your response! I see your point about not needing it configurable. Personally, I agree. In that case, should we further discuss which default behavior to adopt?

Anyway, I applied option 5 to the changes on 6e70621. Since the following block removes the proxy label from the query parameters, so changes become very small.

https://github.com/prometheus-community/prom-label-proxy/blob/main/injectproxy/routes.go#L187-L190

However, it seems difficult to fundamentally solve this problem. As you can see from the test failure, if query params and form data are not set, the matcher will be sent without injected. In addition, some endpoints (such as labels endpoint ) do not have the required fields, so option 5 does not cover all cases.

The problem is the behavior when values ​​are not set in query or form. If we respect the previous behavior of prom-label-proxy, it should inject matcher into only the query value. I would like to fix the behavior when the form value is only set. Summarized in a table, it looks like this:

len(query) len(form) Current behavior The behavior of 6e70621 Correct behavior
== 0 == 0 injected into query not injected injected into query
> 0 == 0 injected into query injected into query injected into query
== 0 > 0 injected into both injected into form injected into form
> 0 > 0 injected into both injected into both injected into both

Therefore, I created a37a998. The tests are passing, so we confirmed that much of the previous behavior is preserved. How about it?

@h-otter
Copy link
Author

h-otter commented Nov 29, 2023

@squat Could you check the comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants