Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes API errors are not logged as errors #168

Open
rbayerl opened this issue Nov 15, 2022 · 5 comments
Open

Kubernetes API errors are not logged as errors #168

rbayerl opened this issue Nov 15, 2022 · 5 comments

Comments

@rbayerl
Copy link

rbayerl commented Nov 15, 2022

A recent change to the Kubernetes auth method changed error logs to debug logs. These errors can be raised for a number of reasons:

  1. The Kubernetes API timed out
  2. The Kubernetes API is broken
  3. The CA certificate does not match
  4. There's an issue with the request

Vault does not differentiate between these separate and very different issues and just always returns HTTP 403 Permission Denied. This wasn't a big deal when context could be found in the logs, but now those logs are gone too. This makes debugging issues much harder by requiring the log level to be set to debug constantly to catch these issues. Can we either:

  1. Change the log level back to error (easiest)
  2. Include the error response in what is sent back to the client
  3. Send the appropriate response code to the client (instead of just Permission Denied for any error)
  4. Both 2 and 3 (best)

I'm happy to open a PR and implement the solution, but it would be nice to have an idea of which option the maintainers prefer.

@tomhjp
Copy link
Contributor

tomhjp commented Nov 15, 2022

Thanks for the issue, I think this is a good one to discuss and improve. The UX concerns you highlight are legitimate, but we also need to balance that against the security requirements, which are a bit tricky because this error and log message gets triggered from an unauthenticated endpoint. I think that probably rules out option 2 altogether, because we want to be especially cautious not to leak any information on unauthenticated endpoints.

Option 3 is kind of in the same bucket, but feels not as bad because we could guarantee that we only tell the client it is misconfigured, but won't include any potentially leaky information in an error string. What specific error code would you suggest?

Option 1 allows unauthenticated clients to spam Vault server logs with errors, which isn't ideal. Option 5, we could also try to do some classification on the error and log it as an error if we're confident it's a misconfiguration, but otherwise log as debug. WDYT?

@rbayerl
Copy link
Author

rbayerl commented Nov 15, 2022

Thanks for the issue, I think this is a good one to discuss and improve. The UX concerns you highlight are legitimate, but we also need to balance that against the security requirements, which are a bit tricky because this error and log message gets triggered from an unauthenticated endpoint. I think that probably rules out option 2 altogether, because we want to be especially cautious not to leak any information on unauthenticated endpoints.

Good call. I didn't really consider the security implications - which is why I raised this for discussion.

Option 3 is kind of in the same bucket, but feels not as bad because we could guarantee that we only tell the client it is misconfigured, but won't include any potentially leaky information in an error string. What specific error code would you suggest?

Response codes are hard when Vault is just acting as a passthrough. Calling it out as a client side error typically means clients won't retry, though. On the other hand calling it a server side error might trigger some team's alarms for monitoring when Vault is malfunctioning...which is not strictly the case here (Vault is fine but a dependency is broken). Maybe 504 Gateway Timeout would be most appropriate?

Option 1 allows unauthenticated clients to spam Vault server logs with errors, which isn't ideal. Option 5, we could also try to do some classification on the error and log it as an error if we're confident it's a misconfiguration, but otherwise log as debug. WDYT?

Option 5 sounds like a great solution. With error classification we could also send different response codes to the client if that's acceptable as well 👍.

@rbayerl
Copy link
Author

rbayerl commented Nov 17, 2022

@tomhjp are we in agreement on option 5? Would you like me to write up a PR or does someone else want to handle this?

@roy-work
Copy link

Option 1 allows unauthenticated clients to spam Vault server logs with errors, which isn't ideal.

If this is a serious concern, rate limit the logs.

But having Vault silently swallowing internal errors while processing auth just cost me two days of debugging.

@tanishq-dubey
Copy link

Agree with @roy-work , lack of errors being bubbled up makes the logging effectively useless and requires the user to drop to a DEBUG or TRACE level, at which point they are already being spammed with logs.

Related to #184 where the DEBUG log contains the endpoint where the context is cancelled, however that should be an ERROR since it makes vault inoperable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants