Audit events via gRPC endpoint #1761

rcrowe · 2023-08-21T14:14:31Z

Is there an existing issue for this?

I have searched the existing issues

Feature description

We (UW) previously contributed Kafka sink for audit logs & have noted that others have requested similar functionality, such as for Nats. Since contributing internally we've been discussing whether Kafka is something we want to use, we may want to move to a more managed offering in AWS or GCP, such as Kinesis.

As I understand it, you (Cerbos) taking these contributions on means you take on the maintenance & support, which you were comfortable at the time with Kafka; what if there was a way to configure a gRPC endpoint for these audit events to be sent to instead for other transports such as Nats that you may not be comfortable adopting?

Much like the OpenTelemetry Collector either Cerbos or the community can offer a gRPC service configured with all the different transports to proxy these events to.

What would the ideal solution look like to you?

gRPC contract:

Accepts access & decision audit entries
Maintained as a public contract under api/public
Audit backend

optional:

Separate repo under the Cerbos organisation for the audit collector

Anything else?

No response

The text was updated successfully, but these errors were encountered:

charithe · 2023-08-21T15:39:42Z

Hey! It's a very nice idea. We've always wanted to offload audit storage to specialised systems such as SIEMs but there wasn't (and, AFAIK, still isn't) a common standard for that. While it would be cool to define our own audit collection API, I am a little bit concerned about how adoptable it would be because it requires our users to write their own collectors and deploy them -- which would be an additional burden for most of them. Writing a good API+collector that can keep track of and persist a large volume of audit events without losing them and ensuring they are not tampered with in transit etc. is not trivial either. At this stage I am not confident that we have the resources to manage that well.

We've always wanted to make Cerbos extensible and allow third-party plugins to add additional functionality. We just haven't gotten around to making that easy and seamless yet. I feel like prioritising that would help address this issue as well because then advanced users such as UW could develop the audit sinks they need.

While we work on that though, is there an intermediate solution using existing functionality that can help address this? My initial thoughts were to use Kafka as an intermediary: Cerbos writes audit logs to Kafka and your log proxy reads from Kafka and writes to whatever other preferred destination you have. However, the downside is that you still need to run Kafka for that to work.

The other option I could think of is using a tool like Vector to read audit logs from Cerbos (using the file audit backend) and distribute to a sink of your choice.

rcrowe · 2023-08-21T16:36:26Z

@charithe Thank you for the great response 🙇🏻

I don't have any initial needs, I was raising the idea in order to see whether it was a viable solution to help Nats as well other transports in the future progress by moving it outside of the main Cerbos repo & therefore the concerns around the maintenance burden.

The idea wasn't to persist or really transform anything from the standard protobuf/JSON you have today, rather it would just proxy to a backend (kafka/nats/pubsub) & apply any necessary back-pressure if that failed. While they would be forced to deploy another service, after the gRPC contract was in place I'd hope we could offer an out-of-the-box service that just requires configuration.

A big part of the experience we've enjoyed from Cerbos has been how simple it is to run, so I understand making that more complex could be a problem 👍🏻

charithe · 2023-08-22T07:29:40Z

Personally, I quite like the idea. Besides the usability aspect of requiring users to deploy a separate service, I think it's a good way to address this. However, because we are talking about audit events here, I think that a pull API might not be acceptable to some users.

Consider for example what happens when a Cerbos instance is shutting down. What should happen to the unscraped audit events? Should Cerbos persist them somewhere and remember to serve from that point the next time (if ever) it starts up? In an environment like Kubernetes where pods can come and go anytime, that store would have to be somewhere central and it would need to keep track of unscraped events so that they can be published to the final sink by some other (batch?) mechanism. Similarly, what should happen if the scraper stops working? How long should each Cerbos instance hold on to the audit events before discarding them or writing them off to an intermediate store? How would the scraper resume and ingest those events?

This is why I feel that a push API might be more appropriate for this use case. Of course, that would have its own set of issues but I think the implementation would be less complex compared to the pull version.

rcrowe · 2023-09-05T21:42:52Z

I'm in agreement; I obviously didn't make it clear that the gRPC contract I was proposing was for a client calling from Cerbos out to an external service, i.e. push based.

The optional, proxy part I then mention is if the Cerbos project provides a standalone service implementing this gRPC contract for common backends.

Like the Kafka implementation in Cerbos today, the implementation could either run sync (fixed sized buffer that blocks when full), or async (FIFO that evicts when overflowing fixed size).

rcrowe added kind/feature status/triage labels Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit events via gRPC endpoint #1761

Audit events via gRPC endpoint #1761

rcrowe commented Aug 21, 2023

charithe commented Aug 21, 2023

rcrowe commented Aug 21, 2023

charithe commented Aug 22, 2023

rcrowe commented Sep 5, 2023

Audit events via gRPC endpoint #1761

Audit events via gRPC endpoint #1761

Comments

rcrowe commented Aug 21, 2023

Is there an existing issue for this?

Feature description

What would the ideal solution look like to you?

Anything else?

charithe commented Aug 21, 2023

rcrowe commented Aug 21, 2023

charithe commented Aug 22, 2023

rcrowe commented Sep 5, 2023