Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade gorilla/mux to at least 1.7.4 #2705

Closed
loudmouth opened this issue Sep 17, 2020 · 4 comments
Closed

Upgrade gorilla/mux to at least 1.7.4 #2705

loudmouth opened this issue Sep 17, 2020 · 4 comments

Comments

@loudmouth
Copy link

loudmouth commented Sep 17, 2020

Expected Behavior

Fast performance using OPA's REST api

Actual Behavior

It can't handle a very high load

Steps to Reproduce the Problem

  • OPA version 0.23.x

My team and I have been using OPA as our authorization system for our user facing application by setting it up a a standalone, centralized service. We really love the rego language and think it's also quite nifty that we can push whatever external data we need to OPA to make the correct authorization decisions when combined with input. Unfortunately, we have generally found that OPAs REST API really starts to degrade in performance with even a not-so-extreme workload.

We've tried many things to resolve our issue, including boosting the resources on the k8s node(s) that OPA is sitting on and giving OPA exclusive access to those nodes, putting our OPA pods on a restart timer, rewriting the majority of our rules to take advantage of partial evaluation, and running OPA as a sidecar instead (since we still experienced the issue with sidecar deployment, we reverted to a central OPA for easier maintainability with a small dev team). The last thing on our end to try will be to change the data payloads we push to OPA and rewrite all of our policies to have faster execution. However, the total amount of data we have is not that large and we even experience this problem in our dev environment which has an extremely small data set—at maximum, only a few hundred items collectively in all the JSON array(s) that are pushed. Despite all these attempts, the performance pales in comparison to how it operated when we simply integrated it as a library. The one thing that we've found decreased our rate of timeout failures the most was simply putting the pods running OPA on a restart timer every few hours.

Our best guess is that is something wrong with the HTTP API, but it's quite hard to pinpoint. We have observed that when OPA has responses that take more than 4 seconds, the CPU allocation significant, but the memory footprint is relatively ok. We suggest updating the http router, gorilla/mux, to at least v1.7.4 as they describe some performance improvements we think might help.

@patrick-east
Copy link
Contributor

It sounds like you have a pretty good way to reproduce the error, have you checked if the upgrade helps with the performance issues?

I don't think there is really much of a reason to avoid updating gorilla/mux, but if it doesn't help there are more things we can look in to for chasing down where the time is being spent.

@srenatus
Copy link
Contributor

updating the http router, gorilla/mux, to at least v1.7.4 as they describe some performance improvements we think might help.

I'm curious about these. 😃 Looking at the release notes, they've improved the performance of the code path for parsing url query parameters (gorilla/mux#544). (💭Do you use these a lot in your setup?) The other perf improvement seems to be gorilla/mux#516... no idea if that could be really impactful. But of course, ho harm in updating.

@tsandall tsandall added this to TODO (Things That Should Be Done) in Open Policy Agent via automation Sep 21, 2020
@tsandall tsandall moved this from TODO (Things That Should Be Done) to In Progress in Open Policy Agent Sep 21, 2020
@tsandall
Copy link
Member

After reading the release notes for gorilla/mux, I agree with the others that it's unclear whether upgrading will have meaningful impact on performance. At the same time, it's easy to test out.

Without more information on the underlying issue (e.g., the rules and data set in use, example query inputs, resource limits on the deployment, etc.) it's difficult to provide more guidance.

If we don't hear back in the next week or so, let's close this.

@loudmouth
Copy link
Author

Hey all,

We really appreciate you 3 taking the time to respond to the issue we created. We'll defer to your assessment of the release notes for gorilla/mux and conclude that this may not be the culprit. Since this issue was specifically about upgrading that dependency, I'll close this issue now.

While we have struggled with performance for a while and our best guess it is the http layer and not the policy engine, we will need further investigation and will share more concrete details about our setup at a later time if relevant.

Thanks again; I really appreciate you being awesome maintainers!

Open Policy Agent automation moved this from In Progress to Done Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants