Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access true while robots.txt disallows all #22

Open
basvdijk opened this issue Sep 1, 2015 · 3 comments
Open

Access true while robots.txt disallows all #22

basvdijk opened this issue Sep 1, 2015 · 3 comments

Comments

@basvdijk
Copy link

basvdijk commented Sep 1, 2015

I have a robots.txt file containing two lines:

User-agent: *
Disallow: /

And used the example shown on the documentation:

var robots = require('robots')
  , parser = new robots.RobotsParser(
                'http://localhost/robots.txt',
                'Mozilla/5.0 (compatible; RobotTxtBot/1.0)',
                after_parse
            );

function after_parse(parser, success) {
  if(success) {
    parser.canFetch('*', '/', function (access, url, reason) {
      if (access) {
        console.log(' url: '+url+', access: '+access);
        // parse url ...
      }
    });
  }
};

Still the code below gives the output:

url: /, access: true

While I expected access to be false

@slrendell
Copy link

I’m getting the same issue whilst running the sample code. Any fixes yet?

@basvdijk
Copy link
Author

The last commit on this repo is from Sep 17, 2018. I guess this library is not maintained anymore...

@slrendell
Copy link

My mistake. I was checking against the full url, ‘http://facebook.com/‘, rather than just ‘/‘. Working properly now, although not thoroughly tested yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants