Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web site detects bot with my script, why? #1124

Closed
fran0254 opened this issue Oct 22, 2017 · 3 comments
Closed

Web site detects bot with my script, why? #1124

fran0254 opened this issue Oct 22, 2017 · 3 comments

Comments

@fran0254
Copy link

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: 0.13.0-alpha
  • Platform / OS version: Ubuntu 17.04 64 bits
  • URLs (if applicable):

What steps will reproduce the problem?

Please include code that reproduces the issue.

const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto('https://www.milanuncios.com/');
  await page.waitFor(3000);
  await page.screenshot({path: 'example.png'});
  //await browser.close();
})();

What is the expected result?
example

Translated to English ...

As you were browsing http://www.milanuncios.com/ something about your browser made us think you were a bot. There are a few reasons this might happen:

You're a power user moving through this website with super-human speed.
You've disabled JavaScript in your web browser.
A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article.

After completing the CAPTCHA below, you will immediately regain access to

If I use a web browser (Mozilla, chrome ..), this problem does not occur. I do not understand what the problem is if the library puppeteer uses the engine (chrome) I am very surprised, you know that can be?

@vsemozhetbyt
Copy link
Contributor

I cannot reproduce the issue with 0.13.0-alpha on Windows 7 x64 for the first script run. Could it be that you have made many other operations on the site and they looked suspicious for them? Algorithms to detect automation may be elaborate and tricky.

@Garbee
Copy link
Contributor

Garbee commented Oct 22, 2017

Headless mode does add "Headless" to the user agent string. Pretty easy sign there.

There are other methods of detecting headless though so it isn't like that is the only way.

They clearly seem to be detecting repeated access traits from the same location and flagging it. This isn't something puppeteer can handle internally.

@ebidel
Copy link
Contributor

ebidel commented Oct 22, 2017

Right. You could try spoofing the UA to see if that works, but there's likely more advanced detection algorithms behind the scenes. Sites that use captchas and detect scraping bots are put in place to prevent automation. Not much PPTR can do for these scenarios.

Related: #473

@ebidel ebidel closed this as completed Oct 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants