Access true while robots.txt disallows all #22

basvdijk · 2015-09-01T12:44:52Z

I have a robots.txt file containing two lines:

User-agent: *
Disallow: /

And used the example shown on the documentation:

var robots = require('robots')
  , parser = new robots.RobotsParser(
                'http://localhost/robots.txt',
                'Mozilla/5.0 (compatible; RobotTxtBot/1.0)',
                after_parse
            );

function after_parse(parser, success) {
  if(success) {
    parser.canFetch('*', '/', function (access, url, reason) {
      if (access) {
        console.log(' url: '+url+', access: '+access);
        // parse url ...
      }
    });
  }
};

Still the code below gives the output:

url: /, access: true

While I expected access to be false

The text was updated successfully, but these errors were encountered:

slrendell · 2020-06-13T03:42:49Z

I’m getting the same issue whilst running the sample code. Any fixes yet?

basvdijk · 2020-06-13T07:43:42Z

The last commit on this repo is from Sep 17, 2018. I guess this library is not maintained anymore...

slrendell · 2020-06-13T21:44:08Z

My mistake. I was checking against the full url, ‘http://facebook.com/‘, rather than just ‘/‘. Working properly now, although not thoroughly tested yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access true while robots.txt disallows all #22

Access true while robots.txt disallows all #22

basvdijk commented Sep 1, 2015

slrendell commented Jun 13, 2020

basvdijk commented Jun 13, 2020

slrendell commented Jun 13, 2020

Access true while robots.txt disallows all #22

Access true while robots.txt disallows all #22

Comments

basvdijk commented Sep 1, 2015

slrendell commented Jun 13, 2020

basvdijk commented Jun 13, 2020

slrendell commented Jun 13, 2020