Skip to content

Releases: apify/crawlee

v3.5.3

31 Aug 07:46
Compare
Choose a tag to compare

3.5.3 (2023-08-31)

Bug Fixes

  • browser-pool: improve error handling when browser is not found (#2050) (282527f), closes #1459
  • clean up inProgress cache when delaying requests via sameDomainDelaySecs (#2045) (f63ccc0)
  • crawler instances with different StorageClients do not affect each other (#2056) (3f4c863)
  • pin all internal dependencies (#2041) (d6f2b17), closes #2040
  • respect current config when creating implicit RequestQueue instance (845141d), closes #2043

Features

  • core: add default dataset helpers to BasicCrawler (#2057) (e2a7544)

v3.5.2

21 Aug 12:40
Compare
Choose a tag to compare

3.5.2 (2023-08-21)

Bug Fixes

  • make the Request constructor options typesafe (#2034) (75e7d65)
  • pin @crawlee/* packages versions in crawlee metapackage (#2040) (61f91c7)
  • support DELETE requests in HttpCrawler (#2039) (7ea5c41), closes #1658

Features

v3.5.1

16 Aug 08:48
Compare
Choose a tag to compare

3.5.1 (2023-08-16)

Bug Fixes

  • add Request.maxRetries to the RequestOptions interface (#2024) (6433821)
  • log original error message on session rotation (#2022) (8a11ffb)

Features

  • exceeding maxSessionRotations calls failedRequestHandler (#2029) (b1cb108), closes #2028

v3.5.0

31 Jul 06:53
Compare
Choose a tag to compare

3.5.0 (2023-07-31)

Bug Fixes

  • cleanup worker stuff from memory storage to fix vitest (#2004) (d2e098c), closes #1999
  • core: add requests from URL list (requestsFromUrl) to the queue in batches (418fbf8), closes #1995
  • core: support relative links in enqueueLinks explicitly provided via urls option (#2014) (cbd9d08), closes #2005

Features

  • add closeCookieModals context helper for Playwright and Puppeteer (#1927) (98d93bb)
  • add support for sameDomainDelaySecs (#2003) (e796883), closes #1993
  • basic-crawler: allow configuring the automatic status message (#2001) (3eb4e4c)
  • core: use RequestQueue.addBatchedRequests() in enqueueLinks helper (4d61ca9), closes #1995
  • retire session on proxy error (#2002) (8c0928b), closes #1912

v3.4.2

19 Jul 14:11
Compare
Choose a tag to compare

3.4.2 (2023-07-19)

Bug Fixes

  • basic-crawler: limit internalTimeoutMillis in addition to requestHandlerTimeoutMillis (#1981) (8122622), closes #1766

Features

  • core: add RequestQueue.addRequestsBatched() that is non-blocking (#1996) (c85485d), closes #1995
  • retryOnBlocked detects blocked webpage (#1956) (766fa9b)

v3.4.1

13 Jul 12:22
Compare
Choose a tag to compare

3.4.1 (2023-07-13)

Bug Fixes

  • http-crawler: replace IncomingMessage with PlainResponse for context's response (#1973) (2a1cc7f), closes #1964

Features

  • jsdom,linkedom: Expose document to crawler router context (#1950) (4536dc2)

v3.4.0

12 Jun 14:37
Compare
Choose a tag to compare

3.4.0 (2023-06-12)

Bug Fixes

Features

v3.3.3

31 May 11:26
Compare
Choose a tag to compare

3.3.3 (2023-05-31)

Bug Fixes

  • MemoryStorage: handle EXDEV errors when purging storages (#1932) (e656050)
  • set status message every 10 seconds and log it via debug level (#1918) (32aede6)

Features

  • add support for requestsFromUrl to RequestQueue (#1917) (7f2557c)
  • core: add Request.maxRetries to allow overriding the maxRequestRetries (#1925) (c5592db)

v3.3.2

11 May 13:23
Compare
Choose a tag to compare

3.3.2 (2023-05-11)

Bug Fixes

  • MemoryStorage: cache requests in RequestQueue (#1899) (063dcd1)
  • respect config object when creating SessionPool (#1881) (db069df)

Features

  • allow running single crawler instance multiple times (#1844) (9e6eb1e), closes #765
  • HttpCrawler: add parseWithCheerio helper to HttpCrawler (#1906) (ff5f76f)
  • router: allow inline router definition (#1877) (2d241c9)
  • RQv2 memory storage support (#1874) (049486b)
  • support alternate storage clients when opening storages (#1901) (661e550)

v3.3.1

11 Apr 07:15
Compare
Choose a tag to compare

3.3.1 (2023-04-11)

Bug Fixes

  • infiniteScroll() not working in Firefox (#1826) (4286c5d), closes #1821
  • jsdom: add timeout to the window.load wait when runScripts are enabled (806de31)
  • jsdom: delay closing of the window and add some polyfills (2e81618)
  • jsdom: use no-op enqueueLinks in http crawlers when parsing fails (fd35270)
  • MemoryStorage: handling of readable streams for key-value stores when setting records (#1852) (a5ee37d), closes #1843
  • start status message logger after the crawl actually starts (5d1df7a)
  • status message - total requests (#1842) (710f734)
  • Storage: queue up opening storages to prevent issues in concurrent calls (#1865) (044c740)
  • templates: added missing '@types/node' peer dependency (#1860) (d37a7e2)
  • try to detect stuck request queue and fix its state (#1837) (95a9f94)

Features

  • add parseWithCheerio context helper to cheerio crawler (b336a73)
  • jsdom: add parseWithCheerio context helper (c8f0796)