-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't return first rule match for canFetch #18
Comments
This is a big issue, but I'm not totally sure returning the last rule to match really is the fix. Is there any way to determine whether a rule is more specific because ultimately we want to get the most specific rule. |
...it depends on how you like to interpret rules. In most ACL-cases you write something like: Or reverse-case: For robots.txt there is normally no official "allow" command - only a "dissallow" command is standard command. So a robots.txt should normally only contain "dissallow" commands to ensure correct interpretation. And remember: robots.txt is NOT a "you should not crawl"-command, it's more "please, don't crawl" or "crawling of... Is not necessary" So in my eyes it's up to creator of ACL to ensure correct order of rules and use of "allow" and there is no way to determinate a "more specific rule" - it's like army: "last order rules" if you respect "allow" command. |
Very true... but I guess it really depends whether you want it to be able to accurately interpret all robots.txt or just ones that strictly follow the spec (practically none of them). |
|
Currently the first matching rule will be returned , but I don't think it's an good idea.
For example this will always be true:
User-Agent: *
Allow: /
Disallow: /admin/
Disallow: /redirect/
change /lib/entry.js
...this will return the last matching rule
The text was updated successfully, but these errors were encountered: