Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Policy Agent checks #99

Open
coopernetes opened this issue Feb 24, 2023 · 16 comments
Open

Open Policy Agent checks #99

coopernetes opened this issue Feb 24, 2023 · 16 comments
Labels
enhancement New feature or request plugins Extensibility of the framework

Comments

@coopernetes
Copy link
Contributor

coopernetes commented Feb 24, 2023

Hi all! 👋

Given the target use case of git proxy of performing security & compliance policy-based checks, it would be nice to reuse existing policies in Rego with Open Policy Agent running either as a sidecar or separate process along with the proxy.

There is also a WASM implementation but I don't have much experience with that myself so not sure if that fits this project.

Describe the solution you'd like

  1. Provide a JSON body as input data into a rego policy from git-proxy.
{
  "author": "tom@example.com",
  "committer": "dick@example.com",
  "date": "1970-01-01T00:00:00",
  "commitSHA": "74d2a005a8619f9f748436269bbb0ae7d28fc15f",
  "url": "https://github.com/finos/test-allowed-repo",
  "branch": "feat/new-widget",
  ...
}
  1. Launch the proxy with the following config.json using OPA in server mode or using embedded Rego & WASM.
"openPolicyAgent": {
  "enabled": true,
  "mode": "client",
  "url": "https://localhost:8181",
  "policies": [
    "authorized_committers",
    "contains_readme",
    "..."
  ]
  }
"openPolicyAgent": {
  "enabled": true,
  "mode": "embedded",
  "policies": [
    "resources/authorized_committers.rego",
    "resources/contains_readme.rego",
    "..."
  ]
}
  1. Proxy sends sends an HTTP request to the OPA server or evaluates the policies in-place.
HTTP POST localhost:8081/v1/data/<policy_name>/allow
{
  "inputs": {
    "author": "tom@example.com",
    "committer": "dick@example.com",
    "date": "1970-01-01T00:00:00",
    "commitSHA": "74d2a005a8619f9f748436269bbb0ae7d28fc15f",
    "url": "https://github.com/finos/test-allowed-repo",
    "branch": "feat/new-widget"
  }
}
Response
{"result":false}

Describe alternatives you've considered
It may be worth while to write a simple interface around querying external systems and allowing the use of other policy engines such as Hashicorp Sentinel.

Additional context
Related issue(s):
#47

@JamieSlome
Copy link
Member

JamieSlome commented Mar 2, 2023

Hi @coopernetes, thanks for raising the feature request 👏 I'm not familiar with Open Policy Agent so bear with me whilst I do some reading of the documentation.

A policy agent or engine is something we should take advantage of. There is a question as to whether this in itself should be configurable, as you rightly mentioned. As an example, as a developer deploying and using Git Proxy within an organization, I may want to use a policy framework or policy language I'm familiar with, nice 👍

I love your examples above; can you just explain what the benefits might be between these two for the developer?

Proxy sends sends an HTTP request to the OPA server or evaluates the policies in-place.

I'll drop further thoughts once I've read through the docs 📝

@JamieSlome JamieSlome self-assigned this Mar 2, 2023
@JamieSlome JamieSlome added the enhancement New feature or request label Mar 2, 2023
@coopernetes
Copy link
Contributor Author

coopernetes commented Mar 4, 2023

I love your examples above; can you just explain what the benefits might be between these two for the developer?

Proxy sends sends an HTTP request to the OPA server or evaluates the policies in-place.

Ah, I may not have been clear. These are really just two different deployment methods. OPA can run as a server with a generic HTTP API to make authorization decisions based on Rego policy language. Since git-proxy is currently implemented in Express, there is also a OPA WASM npm module to compile Rego in binary form which can be embedded as part of git-proxy itself.

When deploying git-proxy, you can either have WASM compiled policies as binaries which would ship with git-proxy itself using that npm package or you deploy OPA as a server running along side git-proxy either on the same host or separate infrastructure. git-proxy would then make an HTTP request to ask for a policy decision in combination with the input data. There are a few limitations with the WASM approach so the HTTP API is the most straight-forward to integrate with.

The rationale behind incorporating OPA is that it would cover a lot of potential use cases with the extensibility of the proxy. Between developers, security and compliance officers, you could then have a common language to define policy using a more application-driven approach to these sorts of custom checks. There's also an existing ecosystem to leverage.

OPA has the concept of input data. This is where most of the heavy-lifting for git-proxy would lay as it would need to provide a JSON structure as an input into the Rego policy and would based off of the commit data. I think a lot of this is already in place. I'm new to the project so this is just an idea at the moment. I guess this would be a custom Step? I haven't dug into the code base enough yet but I assume it would be trivial to use that as inputs into OPA.

I'll provide some examples here when I get the chance to demonstrate some use cases.

@coopernetes
Copy link
Contributor Author

coopernetes commented Mar 4, 2023

Here's a general design:

flowchart TB
    subgraph OPA
        direction LR
        C[OPA server] -.->|"{input: {email, commiter}}"| D[authorized_committers.rego]
        C -.->|"{input: {url}}"| E[authorized_repos.rego]
        C -.->|"{input: {...}}"| F[...]
    end
    A(committer) -->|push| B(git proxy)
    B <-->|http| C
    B -->|"push"| G[GitHub]

Some example policies written in Rego.

Authorized committer & repos

Unauthorized committer & repos

@JamieSlome
Copy link
Member

Sorry for the delay in my response!

@coopernetes, thanks for the time you've put into the above 👍 Definitely provides a clearer image of the potential deployment approaches for Git Proxy.

It feels like the separation of the policy engine and the execution of Git Proxy is desirable. Ultimately, we want the experience of Git Proxy to be highly configurable. It may hold that a team wants to use Rego to enforce their requirements or perhaps an alternative.

Maybe we need to consider making Git Proxy generic enough to interface with "any" form of policy description, whether it be allowing a developer to implement Rego policies or attaching custom JavaScript logic.

Would love to get your thoughts on this level of abstraction. Just thinking in terms of the familiarity of developers with Rego. I am not that well exposed to it yet, not to say that we shouldn't pursue this as the default option or an option.

@maoo, would love to get your thoughts here too... ❤️

@maoo
Copy link
Member

maoo commented Jun 26, 2023

Thanks Jamie for the mention.

@coopernetes - very interesting proposal, this is something that would help git-proxy becoming as generic as originally designed. I think it's worth trying to work together and build a Proof of Concept (PoC); below I'll share some random thoughts, eager to hear your feedback.

  1. I'd suggest to start a conversation around the high level architectural diagram shared by @coopernetes 2 comments ago. When we have a clear idea of current and desired diagrams, it's going to be easier to come up with a list of actions and assign tasks
  2. I think that jumping on a quick meeting with the 3 of us (and anyone else that volunteers to put sweat equity into this) could drastically accelerate the development. WDYT? If you like the idea, please share your availability and timezone at help [at] finos [dot] org for next week; I'll take it from there
  3. I give my +1 on the OPA sidecar implementation (VS rego parsing from git-proxy), as it leads to less overlap between git-proxy and OPA . We could even consider git-proxy as an OPA "plugin" that provides an input endpoint for the git protocol (and the exercise on step 1 could confirm - or not - this approach)
  4. I agree with Jamie's point to still allow custom JS implementation - and potentially explore the possibility to integrate OPA with custom JS (this really depends on git-proxy use cases VS rego features and flexibility). We can consider this as one of the requirements when designing the PoC on step 1.

Off-topic - @coopernetes could well be one of the best GH usernames I've ever seen 😄

@JamieSlome
Copy link
Member

@maoo @coopernetes - I'll schedule an open invite call and we can start hashing this out.

Generally, happy with reducing overlap where we can between Git Proxy and OPA, but really keen to keep ease of use, installation and deployment as simple as possible.

@grovesy
Copy link
Member

grovesy commented Jun 28, 2023

I really like this - Some time back, in the early architecture, the central 'chain' loop that runs through actions, it was intended there would be a mechanism to post Synchronous and Asynchronous webhooks out

.. To do this we needed a standard to be defined, and I think Open Policy is a great candidate for the standard here

Synchronous meaning - Call this webservice, wait for a 'yes/no' response
Asynchronous meaning - call out to a webservice posting the Open OPA payload - at some point a callback will come back with a yes/no response .

The current 'manual approval' flow a asynchronous flow - i.e. we are waiting for a human to come in and hit an approve button to unblock the processing chain.... Potentially the manual review/approval flow could also conform to the OPA standard?

@grovesy
Copy link
Member

grovesy commented Jun 29, 2023

Just to echo @JamieSlome, It's key that git-proxy works out of the box and has a really simple "out of box" deployment model (i.e. minimal services/deployments)

The flip side is to maximize compatibility with any open, adopted standard - so I do like the embedded idea of OPA controls

@coopernetes
Copy link
Contributor Author

coopernetes commented Jul 6, 2023

Sorry for the delay. I do see the need to balance simplicity in git-proxy via built-in features and checks vs embedding another policy framework or engine into the project.

This probably is a good candidate for an extension point in the form of a plugin. Happy to help where I can to prove this out in PoC and/or contribute an implementation. We have a small but growing Rego policy development practice that would be a natural fit as described here but appreciate it may not be suitable to be directly implemented in the proxy.

@JamieSlome
Copy link
Member

@coopernetes - awesome! 🎉 Any help is massively appreciated.

I'll be scheduling a catch-up for Git Proxy soon, where we can all jump on a call and bash heads.

Just finishing up the tail end of some clean up work of the library, so we can actually start coding up features post our chat. I'll create an issue with the outstanding work as a precursor to our call.

@coopernetes
Copy link
Contributor Author

Let's keep this issue to track the specific OPA enhancement. I'll comment on the original extensibility issue (#47) and add details and a design proposal for how we can support a plugin-style ecosystem to extend git-proxy from its core functionality.

@JamieSlome
Copy link
Member

Perfect @coopernetes 🙌

@coopernetes
Copy link
Contributor Author

coopernetes commented Aug 25, 2023

The current 'manual approval' flow a asynchronous flow - i.e. we are waiting for a human to come in and hit an approve button to unblock the processing chain.... Potentially the manual review/approval flow could also conform to the OPA standard?

@grovesy this is definitely doable. OPA has a few mechanisms for handling external data. It depends on where we think the best place for the "state" of this sort of asynchronous flow should live.

Thinking more, this is another good candidate for increased modularity. By default, we want to avoid tight coupling between git-proxy, its own data sources, OPA & its potential sources and any other system needed to make authorization decisions. OPA and git-proxy can share a sink but that should be configurable too. OPA is very generic in this regard so it presents an opportunity for multiple use cases through it as a standard API. For now, I see it as an option alongside the current config-based approach if people want to further externalize a policy.

@maoo maoo added the plugins Extensibility of the framework label Nov 6, 2023
@JamieSlome JamieSlome removed their assignment Nov 6, 2023
@JamieSlome
Copy link
Member

@coopernetes, I think it would be great to get a simple PoC together to demonstrate all types of desirable behavior. From my perspective, I'd like to be able to express policy and policy changes without having to re-deploy Git Proxy in the production environment.

Moreover, I'd want my policies to be able to interface with database(s), i.e. verifying the presence of a given license in an approved license inventory. Does OPA / Rego support external HTTP / API connections and calls?

@coopernetes
Copy link
Contributor Author

@JamieSlome for sure - I'll work on putting together a small PoC in the next few weeks. We probably want to step through each of your points and discuss how best to integrate this with git-proxy.


I'd like to be able to express policy and policy changes without having to re-deploy Git Proxy in the production environment.

That should be achievable by having OPA load its policies via bundles as part of its management APIs & architecture. From the docs:

OPA can periodically download bundles of policy and data from remote HTTP servers.
...
Bundles provide an alternative to pushing policies into OPA via the REST APIs. By configuring OPA to download bundles from a remote HTTP server, you can ensure that OPA has an up-to-date copy of policies and data required for enforcement at all times.

An OSPO's policy can be written in Rego and distributed as a bundle to a trusted, published source. OPA can then ingest that bundle on demand or periodically. You can host this bundle over a custom HTTP, cloud storage or OCI/Docker registry.


Moreover, I'd want my policies to be able to interface with database(s), i.e. verifying the presence of a given license in an approved license inventory. Does OPA / Rego support external HTTP / API connections and calls?

The above bundle functionality can be used to manage policy (Rego files) as well as data. How you create that bundle of external data and distribute it to your git-proxy+OPA deployment is somewhat unique to each organization so I don't think we need a strong opinion in git-proxy.

There's no native database connectors in OPA as far as I know. As far as how git-proxy can be used with OPA and external data sources, I see there being two options:

  1. git-proxy handles the database connection and handles the logic to pass it into OPA as input. This is simplest for OPA but added complexity in this project. A standalone git-proxy plugin could be developed to implement a specific datasource integration. This doesn't require OPA to be redeployed or refreshed as it will read in the input data from the policy authorization request. OPA docs describe this as "overloading input"
  2. Use the bundling system and have OPA refresh its data from an external source. Only HTTP, cloud storage and OCI/Docker is supported AFAIU. This is the most flexible option and allows git-proxy to act as a dumb "client" while OPA handles the refreshing of data used by its policies with other data sources that git-proxy doesn't have to be made aware of. Docs
  3. OPA makes the calls directly to an external source. This seems quite limited and would tightly couple the policy logic in your Rego files with those external data sources. It also relies on the built-in functions such as http.send and doesn't handle retry or timeouts. I personally wouldn't use this since we can rely on eventual consistency using the other two options above. OPA docs.

Based on the comparison of the options above, I would recommend using the bundle API. It requires more work for OSPOs to build out those applications and publishing steps but is the most flexible and least opinionated approach while still taking advantage of OPA.

@JamieSlome
Copy link
Member

@coopernetes; beautiful overview ⭐

Shall we schedule some time to chat together to maybe butt heads on desired architecture and think about whether this meets our needs? We can then look to demonstrate some PoCs if we have any ambiguity from our discussion.

Broadly, the OPA server model seems cleaner; but I do want us to have some consideration for usage and startup for other early adopters. That said, I'm happy for us to stay focused on our use cases and broaden our horizons as adoption improves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugins Extensibility of the framework
Projects
None yet
Development

No branches or pull requests

4 participants