You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @popstas
This is a valid proposal. I had the same issue. Yeah, pls do the PR. Also pls do not forget to add related info to docs. It was a while since you've posted this, so, pls let me know if you are still willing to do this.
2 years since the issue was opened, but if others in the future are looking to get the current URL, it's available in the result object of a customCrawl. Specifically result.options.url. Something like this should do the trick:
What is the current behavior?
No information about current URL in customCrawl()
What is the motivation / use case for changing the behavior?
I'm want to skip request, but add URL to csv for some files like zip, doc, pdf.
My code that do it - https://github.com/viasite/sites-scraper/blob/59449b1b03/src/scrap-site.js#L240-L255
Proposal
Add crawler to customCrawl:
customCrawl: async (page, crawl, crawler)
I tried to store currentURL with
requeststarted
event, but it fail when more when concurrency > 1.What do you think about it? I can make PR.
The text was updated successfully, but these errors were encountered: