Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

advice on performance #71

Open
gerbyzation opened this issue Aug 24, 2020 · 8 comments
Open

advice on performance #71

gerbyzation opened this issue Aug 24, 2020 · 8 comments
Assignees
Labels
question Further information is requested

Comments

@gerbyzation
Copy link

gerbyzation commented Aug 24, 2020

We've been growing our test data set to about 2.5k rules in the policy set, which is making calls to the enforcer quite slow. Is there something that can be done to speed things up, or did I just pick a terrible model for performance?

model:

[request_definition]
r = sub, obj, act

[policy_definition]
p = sub, obj, act

[role_definition]
g = _, _

[policy_effect]
e = some(where (p.eft == allow))

[matchers]
m = g(r.sub, p.sub) && keyMatch(r.obj, p.obj) && r.act == p.act

example policy:

g,act.io/org:acmecorp/group:admins,act.io/org:acmecorp/group:members
p,act.io/org:acmecorp/group:admins,act.io/org:acmecorp/info,edit
p,act.io/org:acmecorp/group:admins,act.io/org:acmecorp/group:members,manage
p,act.io/org:acmecorp/group:members,act.io/org:acmecorp,view
p,act.io/org:acmecorp/group:members,act.io/org:acmecorp/plan:*,view
p,act.io/org:acmecorp/group:members,act.io/org:acmecorp/plan:*,create

g,act.io/user:cde7860e-5459-413c-969a-9536edbb1980,act.io/org:acmecorp/group:admins
g,act.io/user:554a09e6-2d6a-406d-aef0-2ea0e09a0e9f,act.io/org:acmecorp/group:members

g,act.io/org:acmecorp/campaign:1/group:admins,act.io/org:acmecorp/campaign:1/group:members
p,act.io/org:acmecorp/campaign:1/group:admins,act.io/org:acmecorp/campaign:1/info,edit
p,act.io/org:acmecorp/campaign:1/group:admins,act.io/org:acmecorp/campaign:1/group:members,manage
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1,view
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/submission:*,approve
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,create
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,read
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,update

g,act.io/user:cde7860e-5459-413c-969a-9536edbb1980,act.io/org:acmecorp/campaign:1/group:admins
g,act.io/user:3ae34b92-a0f8-4b3f-a7bd-877c9ba303a4,act.io/org:acmecorp/campaign:1/group:members

g,act.io/org:acmecorp/campaign:2/group:admins,act.io/org:acmecorp/campaign:2/group:members
p,act.io/org:acmecorp/campaign:2/group:admins,act.io/org:acmecorp/campaign:2/info,edit
p,act.io/org:acmecorp/campaign:2/group:admins,act.io/org:acmecorp/campaign:2/group:members,manage
p,act.io/org:acmecorp/campaign:2/group:members,act.io/org:acmecorp/campaign:2,view
p,act.io/org:acmecorp/campaign:2/group:members,act.io/org:acmecorp/campaign:2/submission:*,approve
p,act.io/org:acmecorp/campaign:2/group:members,act.io/org:acmecorp/campaign:2/version:*,create
p,act.io/org:acmecorp/campaign:2/group:members,act.io/org:acmecorp/campaign:2/version:*,read
p,act.io/org:acmecorp/campaign:2/group:members,act.io/org:acmecorp/campaign:2/version:*,update

g,act.io/user:925d614b-12f4-4048-91c5-adf9f278dcb9,act.io/org:acmecorp/campaign:2/group:admins
g,act.io/user:6e48501a-eff8-4ec7-8215-638c7726e163,act.io/org:acmecorp/campaign:2/group:members
@hsluoyz
Copy link
Member

hsluoyz commented Aug 25, 2020

Hi @gerbyzation I want to know what kinds of rules count for most of the 2.5k rules? How many p rules and how many g rules? I saw a lot of:

g,act.io/user:cde7860e-5459-413c-969a-9536edbb1980,act.io/org:acmecorp/campaign:1/group:admins
g,act.io/user:3ae34b92-a0f8-4b3f-a7bd-877c9ba303a4,act.io/org:acmecorp/campaign:1/group:members

So each user ID will appear as one g rule, right? But g rules will be loaded into memory as a RBAC inheritance tree, so it won't be slow.

I also see bunches of rules like this:

g,act.io/org:acmecorp/campaign:1/group:admins,act.io/org:acmecorp/campaign:1/group:members
p,act.io/org:acmecorp/campaign:1/group:admins,act.io/org:acmecorp/campaign:1/info,edit
p,act.io/org:acmecorp/campaign:1/group:admins,act.io/org:acmecorp/campaign:1/group:members,manage
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1,view
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/submission:*,approve
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,create
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,read
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,update

Will this same rule set duplicate from 1, 2, 3, .. to 100000? If yes, why not use one single "pattern" rule set to replace them all? So you will end up with a very small set of rules.

@hsluoyz hsluoyz self-assigned this Aug 25, 2020
@hsluoyz hsluoyz added the question Further information is requested label Aug 25, 2020
@gerbyzation
Copy link
Author

gerbyzation commented Aug 25, 2020

@hsluoyz at the moment we have ~2.5k p rules and ~1k g rules.

Yes, we'd create these kind of rule set for each org and project/campaign that is created. Our plan was to have granular rules to allow permissions to be configurable for a specific org, project or team.

Thanks for taking a look!

@hsluoyz
Copy link
Member

hsluoyz commented Aug 25, 2020

Will campaign:1 people be able to access campaign:2 or even another org's resources (aka cross-domain access)?

@gerbyzation
Copy link
Author

@hsluoyz yes, when given permission ofcourse. We want people to have a single account that can have access to various organizations and/or campaigns.

@hsluoyz
Copy link
Member

hsluoyz commented Aug 25, 2020

First, you can merge these into one:

p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,create
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,read
p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,update

into:

p,act.io/org:acmecorp/campaign:1/group:members,act.io/org:acmecorp/campaign:1/version:*,(create)|(read)|(update)

Of course, model needs to be updated.

@gerbyzation
Copy link
Author

@hsluoyz ok so that would change the matcher to m = g(r.sub, p.sub) && keyMatch(r.obj, p.obj) && regexMatch(r.act, p.act)

How does the regexMatch relate to keyMatch in terms of efficiency?

@wakemaster39
Copy link

wakemaster39 commented Sep 18, 2020

So i'll toss my findings in here rather than creating a new ticket, Python is not Go and performance is approaching 50x time worse than the benchmarks. Just a simple sample implementation:

import time
from pathlib import Path

from casbin import Enforcer

root_dir = Path(__file__).parent
enforcer: Enforcer = Enforcer(f"{str(root_dir)}/rbac_model.conf")

for x in range(11000):
    enforcer.add_policy(f"/user{x}", "/obj", "read")


s = time.time()
a = enforcer.enforce("/user", "obj", "read")
print(a, (time.time() - s) * 1000)

Output is False 62.0729923248291, according to benchmarks it should be ~2. Now python is not go, so it is not that I am complaining code is bad here or anything this is just a fact of our chosen language. I have PRs open to implemented filter policy support which is a massively big win here as my ruleset is 50k so being able to get it down to a few hundred or a few thousands for certain long large actions a major plus. See #65 and will add the sqlalchemy adapter support once approved.

You should use filtered policies independent of what I put below as loading up a millions of entries is a massive long time and there isn't much you can do about that impact.

The root cause of this performance issue is :

for i, pvals in enumerate(self.model.model["p"]["p"].policy):

It is a O(n) operation and as the number of rules scale the performance will degrade, not much way around this. This really comes from the fact of how generalized Casbin is, it starts as a standard object/subject/action rules system but it doesn't use any of that knowledge in its engine, you could completely disregard that strategy and just have a single string comparison if you wanted.

For me, for performance you need to remove generalization. If you can make assumptions, you can find way to work some magic. And for me, that magic was assuming that we have a standard object/subject/action base rule system and you can add whatever additional fields on the end, just having these three are important.

So this is my request definition I am working with:

[request_definition]
r = sub, obj, act

The important bit now is, how can I shrink the size of my model so that the n is small and fast. The way I did this was tackling the model storage. By default the model object stores the rules in a list and the rule is also a list. But there is nothing stopping us from injecting our own storage engine, so I have made my own model class and my own storage engine for the rules:

class FilteredPolicy:
    _cache: Dict[str, Dict[str, Set[Sequence[str]]]]
    _current_filter: Optional[Set[Sequence[str]]]

    def __init__(self):
        self._cache = {}
        self._current_filter = None

    def __iter__(self):
        yield from self.__get_policy()

    def __len__(self):
        return len(list(self.__get_policy()))

    def __contains__(self, item):
        k1 = item[1]
        k2 = item[2]
        if k1 not in self._cache:
            return False
        if k2 not in self._cache[k1]:
            return False
        return tuple(item) in self._cache[k1][k2]

    def append(self, item: Sequence[str]) -> None:
        k1 = item[1]
        k2 = item[2]
        if k1 not in self._cache:
            self._cache[k1] = {}
        if k2 not in self._cache[k1]:
            self._cache[k1][k2] = set()

        self._cache[k1][k2].add(tuple(item))

    def remove(self, policy: Sequence[str]) -> bool:
        k1 = policy[1]
        k2 = policy[2]
        if k1 not in self._cache:
            return True
        if k2 not in self._cache[k1]:
            return True

        self._cache[k1][k2].remove(tuple(policy))
        return True

    def __get_policy(self):
        if self._current_filter is not None:
            return (list(x) for x in self._current_filter)
        else:
            return (list(v2) for v in self._cache.values() for v1 in v.values() for v2 in v1)

    def apply_filter(self, k1, k2):
        if k1 not in self._cache:
            self._current_filter = set()
        elif k2 not in self._cache[k1]:
            self._current_filter = set()
        else:
            self._current_filter = self._cache[k1][k2]

    def clear_filter(self) -> None:
        self._current_filter = None

class FastModel(Model):
    def add_def(self, sec, key, value):
        super().add_def(sec, key, value)
        if sec == "p" and key == "p":
            self.model[sec][key].policy = FilteredPolicy()

This is where the assumption of the request format matters, I purposely abuse the predefined order of operations to store my model in layered dictionaries. I also mimic the List protocol so this acts as a drop in replacement without changing any other code in the code base. On the surface this doesn't do anything until it is combined with one last modification to the enforcer:

class FastEnforcer(Enforcer):
    @staticmethod
    def new_model(path: str = "", text: str = "") -> FastModel:
        """creates a model."""

        m = FastModel()
        if len(path) > 0:
            m.load_model(path)
        else:
            m.load_model_from_text(text)

        return m

    def enforce(self, *rvals: str) -> bool:
        self.model.model["p"]["p"].policy.apply_filter(rvals[1], rvals[2])
        result = super().enforce(*rvals)
        self.model.model["p"]["p"].policy.clear_filter()
        return result

Putting this all together, I now have shrunk down the size of my n to only be the number of rules that match both the action and the object, so the number of subjects that have individual rules that match here is the size of n. Now it doesn't matter how big my global rule set is, it only matters the number of rules matching those two unique identifiers. This allows me to remove all need for trying to combing down rule sets and use regular expressions and keep my ruleset condensed. If fact condensing rulesets is now bad and not performant because regular expressions are slow then compared to string comparisons.

If i modify my initial experiment above to have 10 million rows, you can see the massive performance gains:

for x in range(100):
    for y in range(100000):
        enforcer.add_policy(f"/user{x}", f"/obj{y}", "read")


s = time.time()
# this is the absolute worst case last entry and should require iterating 10M rows and be very slow
a = enforcer.enforce("/user99", "/obj99999", "read")
print(a, (time.time() - s) * 1000)

Output: True 0.8349418640136719

So what are the side effects? Speed, speed everywhere. Adds are significantly sped up on large rule sets, deletes get the same treatment and so enforcements. You are tied to your models request format, but honestly you are also tied to this format based on the need to persist the model anyways. If your model requires additional layers of caching, or your model request order does not match the order given here the cache mechanism will explode.

This is a rather simple fix, if you adjust the FilteredPolicy to meet your needs then you can drop in replace it for your current model format. We have not touched any of the internal about how Casbin works so any other extensions or plugins someone one day generates should also continue to function as well.

Now I put all of this here hoping it helps someone with improving their performance, but also as a question to the devs. Is something like this possible to merge into the main package, or if does needs to be an external package. I am happy to create a new package that extends the current casbin implementation like this but I figured it would be best to see the feedback here first. I have a bunch of other management improvements that I would likely ship in an external package but would rather have people be able to get as much benefit as possible without them hopefully find my random package.

@wakemaster39
Copy link

I have taken what I created here, generalized it a whole lot and did the needful and wrapped it up into a package.

https://pypi.org/project/fastbin/

I will be porting my current production usage away from the code above to this package and roll it to v1 stable release next week hopefully. Been using the sample code for a few months running stable so there shouldn't be too many issues to sort out with the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Development

Successfully merging a pull request may close this issue.

3 participants