New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Puppeteer is being blocked by some sites (which uses distill networks) #4985
Comments
I was running without headless false.They were detecting that its a bot. I changed headless=false then. Still they are detecting. If I browse from my chromium browser it works. But if I browse through puppeteer with headless false (which is again chromium browser) they are detecting that its a bot. |
This is out of the control from Puppeteer devs. You could use extensions to lower the detection rate aka https://github.com/berstend/puppeteer-extra |
Shameless plug for my framework which defeats distill: https://nicoandmee.github.io/puppeteer-theater/ |
did you try to add --user-agent to your puppeteer args? const args = [ |
I tried this right now and really works, you saved my life 😁 |
these args make it worse for me, instant blocking |
@LucasZanella @manibharathytu is the issue solved? |
For me it didn't work. Had to install a plugin I found somewhere that does the job. Can't remember the name |
Partially. This is becasue distill uses some machine learning based backend which takes in all the browser params and decides whther its a bot or not. So it is very difficult to trick it completely. After a heavy research and debuggin their js code, I was able to tweak my code to trick distill 4 out of 5 times. I had to randomize some of my puppeteer browser fingerprint dynamically and even my ip to make it look its not a bot. I had to manually tweak some browser parameters one by one (eg: audio/video supported, extensions installed, mouse movement etc) distill js checks and then trick distill to think my puppeteer is a normal chromium browser. I don't remember all the parameter I had to tweak . You can get a good idea by debugging their js and see what are the checks they are doing. And compare it with the normal chromium browser and see what parameter changes that makes distill think that puppeteer is a bot. I vaguely remember, I was stuck with 2 or 3 native browser parameters which I cant change from js, because of that parameter difference, distill was able to detect that 1 out of 5 times. If those params can be hacked distill can be tricked 100% of the time Some references to get started : |
We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days. |
We are closing this issue. If the issue still persists in the latest version of Puppeteer, please reopen the issue and update the description. We will try our best to accomodate it! |
Please reopen the issue... I am still seeing the problem on some sites |
I can't get past Zillow no matter what |
I recently upgraded from nodejs 12 to nodejs 16.For that I had to switch from https://github.com/alixaxel/chrome-aws-lambda/tree/v3.1.1 to https://github.com/Sparticuz/chrome-aws-lambda/tree/puppeteer%4013.5.0 because I started seeing issues described by this open bug alixaxel/chrome-aws-lambda#264 .After that I started seeing errors like |
Steps to reproduce
Tell us about your environment:
What steps will reproduce the problem?
Please include code that reproduces the issue.
What is the expected result?
It should get the proper response (which you can see by browsing to https://streeteasy.com)
https://drive.google.com/open?id=1-L3bjQWs9Et4Kk6dGOhUyi2ePEgtuEMM
Refer 1.png
What happens instead?
Captcha page is shown
https://drive.google.com/open?id=1-L3bjQWs9Et4Kk6dGOhUyi2ePEgtuEMM
Refer 2.png
The text was updated successfully, but these errors were encountered: