Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A weird problem: get 404 page by read method #109

Open
winglight opened this issue Oct 19, 2018 · 0 comments
Open

A weird problem: get 404 page by read method #109

winglight opened this issue Oct 19, 2018 · 0 comments

Comments

@winglight
Copy link

Here's the url: http://www.8wenku.com/book/2414

I tried to open this page by postman or browser those are normal web page, but I got a 404 page after the read method callback return the article.

The code is followed here:

read(url, {strictSSL: false, headers: {
                    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
                }},function(err, article, meta) {
                if (!err) {
                    //article.content or article.html display a 404 page here

                    // Close article to clean up jsdom and prevent leaks
                    article.close();
                } else {
                    console.log("crawlTopic read error: " + err);
                    callback(err);
                }

            });
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant