-
-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
advice on performance #71
Comments
Hi @gerbyzation I want to know what kinds of rules count for most of the 2.5k rules? How many p rules and how many g rules? I saw a lot of:
So each user ID will appear as one I also see bunches of rules like this:
Will this same rule set duplicate from 1, 2, 3, .. to 100000? If yes, why not use one single "pattern" rule set to replace them all? So you will end up with a very small set of rules. |
@hsluoyz at the moment we have ~2.5k p rules and ~1k g rules. Yes, we'd create these kind of rule set for each org and project/campaign that is created. Our plan was to have granular rules to allow permissions to be configurable for a specific org, project or team. Thanks for taking a look! |
Will |
@hsluoyz yes, when given permission ofcourse. We want people to have a single account that can have access to various organizations and/or campaigns. |
First, you can merge these into one:
into:
Of course, model needs to be updated. |
@hsluoyz ok so that would change the matcher to How does the |
So i'll toss my findings in here rather than creating a new ticket, Python is not Go and performance is approaching 50x time worse than the benchmarks. Just a simple sample implementation: import time
from pathlib import Path
from casbin import Enforcer
root_dir = Path(__file__).parent
enforcer: Enforcer = Enforcer(f"{str(root_dir)}/rbac_model.conf")
for x in range(11000):
enforcer.add_policy(f"/user{x}", "/obj", "read")
s = time.time()
a = enforcer.enforce("/user", "obj", "read")
print(a, (time.time() - s) * 1000) Output is You should use filtered policies independent of what I put below as loading up a millions of entries is a massive long time and there isn't much you can do about that impact. The root cause of this performance issue is : pycasbin/casbin/core_enforcer.py Line 238 in 88bcf96
It is a O(n) operation and as the number of rules scale the performance will degrade, not much way around this. This really comes from the fact of how generalized Casbin is, it starts as a standard object/subject/action rules system but it doesn't use any of that knowledge in its engine, you could completely disregard that strategy and just have a single string comparison if you wanted. For me, for performance you need to remove generalization. If you can make assumptions, you can find way to work some magic. And for me, that magic was assuming that we have a standard So this is my request definition I am working with:
The important bit now is, how can I shrink the size of my model so that the class FilteredPolicy:
_cache: Dict[str, Dict[str, Set[Sequence[str]]]]
_current_filter: Optional[Set[Sequence[str]]]
def __init__(self):
self._cache = {}
self._current_filter = None
def __iter__(self):
yield from self.__get_policy()
def __len__(self):
return len(list(self.__get_policy()))
def __contains__(self, item):
k1 = item[1]
k2 = item[2]
if k1 not in self._cache:
return False
if k2 not in self._cache[k1]:
return False
return tuple(item) in self._cache[k1][k2]
def append(self, item: Sequence[str]) -> None:
k1 = item[1]
k2 = item[2]
if k1 not in self._cache:
self._cache[k1] = {}
if k2 not in self._cache[k1]:
self._cache[k1][k2] = set()
self._cache[k1][k2].add(tuple(item))
def remove(self, policy: Sequence[str]) -> bool:
k1 = policy[1]
k2 = policy[2]
if k1 not in self._cache:
return True
if k2 not in self._cache[k1]:
return True
self._cache[k1][k2].remove(tuple(policy))
return True
def __get_policy(self):
if self._current_filter is not None:
return (list(x) for x in self._current_filter)
else:
return (list(v2) for v in self._cache.values() for v1 in v.values() for v2 in v1)
def apply_filter(self, k1, k2):
if k1 not in self._cache:
self._current_filter = set()
elif k2 not in self._cache[k1]:
self._current_filter = set()
else:
self._current_filter = self._cache[k1][k2]
def clear_filter(self) -> None:
self._current_filter = None
class FastModel(Model):
def add_def(self, sec, key, value):
super().add_def(sec, key, value)
if sec == "p" and key == "p":
self.model[sec][key].policy = FilteredPolicy() This is where the assumption of the request format matters, I purposely abuse the predefined order of operations to store my model in layered dictionaries. I also mimic the class FastEnforcer(Enforcer):
@staticmethod
def new_model(path: str = "", text: str = "") -> FastModel:
"""creates a model."""
m = FastModel()
if len(path) > 0:
m.load_model(path)
else:
m.load_model_from_text(text)
return m
def enforce(self, *rvals: str) -> bool:
self.model.model["p"]["p"].policy.apply_filter(rvals[1], rvals[2])
result = super().enforce(*rvals)
self.model.model["p"]["p"].policy.clear_filter()
return result Putting this all together, I now have shrunk down the size of my If i modify my initial experiment above to have 10 million rows, you can see the massive performance gains: for x in range(100):
for y in range(100000):
enforcer.add_policy(f"/user{x}", f"/obj{y}", "read")
s = time.time()
# this is the absolute worst case last entry and should require iterating 10M rows and be very slow
a = enforcer.enforce("/user99", "/obj99999", "read")
print(a, (time.time() - s) * 1000) Output: So what are the side effects? Speed, speed everywhere. Adds are significantly sped up on large rule sets, deletes get the same treatment and so enforcements. You are tied to your models request format, but honestly you are also tied to this format based on the need to persist the model anyways. If your model requires additional layers of caching, or your model request order does not match the order given here the cache mechanism will explode. This is a rather simple fix, if you adjust the Now I put all of this here hoping it helps someone with improving their performance, but also as a question to the devs. Is something like this possible to merge into the main package, or if does needs to be an external package. I am happy to create a new package that extends the current casbin implementation like this but I figured it would be best to see the feedback here first. I have a bunch of other management improvements that I would likely ship in an external package but would rather have people be able to get as much benefit as possible without them hopefully find my random package. |
I have taken what I created here, generalized it a whole lot and did the needful and wrapped it up into a package. https://pypi.org/project/fastbin/ I will be porting my current production usage away from the code above to this package and roll it to v1 stable release next week hopefully. Been using the sample code for a few months running stable so there shouldn't be too many issues to sort out with the package. |
We've been growing our test data set to about 2.5k rules in the policy set, which is making calls to the enforcer quite slow. Is there something that can be done to speed things up, or did I just pick a terrible model for performance?
model:
example policy:
The text was updated successfully, but these errors were encountered: