Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit events via gRPC endpoint #1761

Open
1 task done
rcrowe opened this issue Aug 21, 2023 · 4 comments
Open
1 task done

Audit events via gRPC endpoint #1761

rcrowe opened this issue Aug 21, 2023 · 4 comments

Comments

@rcrowe
Copy link
Contributor

rcrowe commented Aug 21, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Feature description

We (UW) previously contributed Kafka sink for audit logs & have noted that others have requested similar functionality, such as for Nats. Since contributing internally we've been discussing whether Kafka is something we want to use, we may want to move to a more managed offering in AWS or GCP, such as Kinesis.

As I understand it, you (Cerbos) taking these contributions on means you take on the maintenance & support, which you were comfortable at the time with Kafka; what if there was a way to configure a gRPC endpoint for these audit events to be sent to instead for other transports such as Nats that you may not be comfortable adopting?

Much like the OpenTelemetry Collector either Cerbos or the community can offer a gRPC service configured with all the different transports to proxy these events to.

What would the ideal solution look like to you?

gRPC contract:

  • Accepts access & decision audit entries
  • Maintained as a public contract under api/public
  • Audit backend

optional:

  • Separate repo under the Cerbos organisation for the audit collector

Anything else?

No response

@charithe
Copy link
Contributor

Hey! It's a very nice idea. We've always wanted to offload audit storage to specialised systems such as SIEMs but there wasn't (and, AFAIK, still isn't) a common standard for that. While it would be cool to define our own audit collection API, I am a little bit concerned about how adoptable it would be because it requires our users to write their own collectors and deploy them -- which would be an additional burden for most of them. Writing a good API+collector that can keep track of and persist a large volume of audit events without losing them and ensuring they are not tampered with in transit etc. is not trivial either. At this stage I am not confident that we have the resources to manage that well.

We've always wanted to make Cerbos extensible and allow third-party plugins to add additional functionality. We just haven't gotten around to making that easy and seamless yet. I feel like prioritising that would help address this issue as well because then advanced users such as UW could develop the audit sinks they need.

While we work on that though, is there an intermediate solution using existing functionality that can help address this? My initial thoughts were to use Kafka as an intermediary: Cerbos writes audit logs to Kafka and your log proxy reads from Kafka and writes to whatever other preferred destination you have. However, the downside is that you still need to run Kafka for that to work.

The other option I could think of is using a tool like Vector to read audit logs from Cerbos (using the file audit backend) and distribute to a sink of your choice.

@rcrowe
Copy link
Contributor Author

rcrowe commented Aug 21, 2023

@charithe Thank you for the great response 🙇🏻

I don't have any initial needs, I was raising the idea in order to see whether it was a viable solution to help Nats as well other transports in the future progress by moving it outside of the main Cerbos repo & therefore the concerns around the maintenance burden.

The idea wasn't to persist or really transform anything from the standard protobuf/JSON you have today, rather it would just proxy to a backend (kafka/nats/pubsub) & apply any necessary back-pressure if that failed. While they would be forced to deploy another service, after the gRPC contract was in place I'd hope we could offer an out-of-the-box service that just requires configuration.

A big part of the experience we've enjoyed from Cerbos has been how simple it is to run, so I understand making that more complex could be a problem 👍🏻

@charithe
Copy link
Contributor

Personally, I quite like the idea. Besides the usability aspect of requiring users to deploy a separate service, I think it's a good way to address this. However, because we are talking about audit events here, I think that a pull API might not be acceptable to some users.

Consider for example what happens when a Cerbos instance is shutting down. What should happen to the unscraped audit events? Should Cerbos persist them somewhere and remember to serve from that point the next time (if ever) it starts up? In an environment like Kubernetes where pods can come and go anytime, that store would have to be somewhere central and it would need to keep track of unscraped events so that they can be published to the final sink by some other (batch?) mechanism. Similarly, what should happen if the scraper stops working? How long should each Cerbos instance hold on to the audit events before discarding them or writing them off to an intermediate store? How would the scraper resume and ingest those events?

This is why I feel that a push API might be more appropriate for this use case. Of course, that would have its own set of issues but I think the implementation would be less complex compared to the pull version.

@rcrowe
Copy link
Contributor Author

rcrowe commented Sep 5, 2023

I'm in agreement; I obviously didn't make it clear that the gRPC contract I was proposing was for a client calling from Cerbos out to an external service, i.e. push based.

The optional, proxy part I then mention is if the Cerbos project provides a standalone service implementing this gRPC contract for common backends.

Like the Kafka implementation in Cerbos today, the implementation could either run sync (fixed sized buffer that blocks when full), or async (FIFO that evicts when overflowing fixed size).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants