Skip to content

Releases: apify/crawlee

v0.19.1

30 Jan 16:13
Compare
Choose a tag to compare
  • BREAKING (EXPERIMENTAL): session.checkStatus() -> session.retireOnBlockedStatusCodes().
  • Session API is no longer considered experimental.
  • Updates documentation and introduces a few internal changes.

v0.19.0

20 Jan 12:01
342c727
Compare
Choose a tag to compare
  • BREAKING: APIFY_LOCAL_EMULATION_DIR env var is no longer supported (deprecated on 2018-09-11).
    Use APIFY_LOCAL_STORAGE_DIR instead.
  • SessionPool API updates and fixes. The API is no longer considered experimental.
  • Logging of system info moved from require time to Apify.main() invocation.
  • Use native RegExp instead of xregexp for unicode property escapes.

v0.18.1

08 Jan 08:19
db460f5
Compare
Choose a tag to compare
  • Fix SessionPool not automatically working in CheerioCrawler.
  • Fix incorrect management of page count in PuppeteerPool.

v0.18.0

06 Jan 12:16
343366d
Compare
Choose a tag to compare
  • BREAKING CheerioCrawler ignores ssl errors by default - options.ignoreSslErrors: true.
  • Add SessionPool implemenation to CheerioCrawler.
  • Add SessionPool implementation to PuppeteerPool and PupeteerCrawler.
  • Fix Request constructor not making a copy of objects such as userData and headers.
  • Fix desc option not being applied in local dataset.getData().

v0.17.0

25 Nov 16:02
Compare
Choose a tag to compare
  • BREAKING: Node 8 and 9 are no longer supported. Please use Node 10.17.0 or higher.
  • DEPRECATED: Apify.callTask() body and contentType options are now deprecated.
    Use input instead. It must be of content-type: application/json.
  • Add default SessionPool implementation to BasicCrawler.
  • Add the ability to create ad-hoc webhooks via Apify.call() and Apify.callTask().
  • Add an example of form filling with Puppeteer.
  • Add country option to Apify.getApifyProxyUrl().
  • Add Apify.utils.puppeteer.saveSnapshot() helper to quickly save HTML and screenshot of a page.
  • Add the ability to pass got supported options to requestOptions in CheerioCrawler
    thus supporting things such as cookieJar again.
  • Switch Puppeteer to web socket again due to suspected pipe errors.
  • Fix an issue where some encodings were not correctly parsed in CheerioCrawler.
  • Fix parsing bad Content-Type headers for CheerioCrawler.
  • Fix custom headers not being correctly applied in Apify.utils.requestAsBrowser().
  • Fix dataset limits not being correctly applied.
  • Fix a race condition in RequestQueueLocal.
  • Fix RequestList persistence of downloaded sources in key-value store.
  • Fix Apify.utils.puppeteer.blockRequests() always including default patterns.
  • Fix inconsistent behavior of Apify.utils.puppeteer.infiniteScroll() on some websites.
  • Fix retry histogram statistics sometimes showing invalid counts.
  • Added regexps for Youtube videos (YOUTUBE_REGEX, YOUTUBE_REGEX_GLOBAL) to utils.social
  • Added documentation for option json in handlePageFunction of CheerioCrawler

v0.16.1

31 Oct 10:34
Compare
Choose a tag to compare
  • Add useIncognitoPages option to PuppeteerPool to enable opening new pages in incognito
    browser contexts. This is useful to keep cookies and cache unique for each page.
  • Added options to load every content type in CheerioCrawler.
    There are new options body and contentType in handlePageFunction for this purposes.
  • DEPRECATED: CheerioCrawler html option in handlePageFunction was replaced with body option.

v0.16.0

30 Sep 09:51
Compare
Choose a tag to compare
  • Update @apify/http-request to version 1.1.2.
  • Update CheerioCrawler to use requestAsBrowser() to better disguise as a real browser.

v0.15.5

19 Aug 07:46
Compare
Choose a tag to compare
  • This release just updates some dependencies (not Puppeteer).

v0.15.4

02 Aug 10:22
Compare
Choose a tag to compare
  • DEPRECATED: dataset.delete(), keyValueStore.delete() and requestQueue.delete() methods have been deprecated in favor of *.drop() methods, because the drop name more clearly communicates the fact that those methods drop / delete the storage itself, not individual elements in the storage.
  • Added Apify.utils.requestAsBrowser() helper function that enables you to make HTTP(S) requests disguising as a browser (Firefox). This may help in overcoming certain anti-scraping and anti-bot protections.
  • Added options.gotoTimeoutSecs to PuppeteerCrawler to enable easier setting of navigation timeouts.
  • PuppeteerPool options that were deprecated from the PuppeteerCrawler constructor were finally removed. Please use maxOpenPagesPerInstance, retireInstanceAfterRequestCount, instanceKillerIntervalSecs, killInstanceAfterSecs and proxyUrls via the puppeteerPoolOptions object.
  • On the Apify Platform a warning will now be printed when using an outdated apify package version.
  • Apify.utils.puppeteer.enqueueLinksByClickingElements() will now print a warning when the nodes it
    tries to click become modified (detached from DOM). This is useful to debug unexpected behavior.

v0.15.3

29 Jul 12:15
Compare
Choose a tag to compare
  • Apify.launchPuppeteer() now accepts proxyUrl with the https, socks4
    and socks5 schemes, as long as it doesn't contain username or password.
    This is to fix Issue #420.
  • Added desiredConcurrency option to AutoscaledPool constructor, removed
    unnecessary bound check from the setter property