Releases: apify/crawlee
Releases · apify/crawlee
v0.19.1
v0.19.0
- BREAKING:
APIFY_LOCAL_EMULATION_DIR
env var is no longer supported (deprecated on 2018-09-11).
UseAPIFY_LOCAL_STORAGE_DIR
instead. SessionPool
API updates and fixes. The API is no longer considered experimental.- Logging of system info moved from
require
time toApify.main()
invocation. - Use native
RegExp
instead ofxregexp
for unicode property escapes.
v0.18.1
v0.18.0
- BREAKING
CheerioCrawler
ignores ssl errors by default -options.ignoreSslErrors: true
. - Add
SessionPool
implemenation toCheerioCrawler
. - Add
SessionPool
implementation toPuppeteerPool
andPupeteerCrawler
. - Fix
Request
constructor not making a copy of objects such asuserData
andheaders
. - Fix
desc
option not being applied in localdataset.getData()
.
v0.17.0
- BREAKING: Node 8 and 9 are no longer supported. Please use Node 10.17.0 or higher.
- DEPRECATED:
Apify.callTask()
body
andcontentType
options are now deprecated.
Useinput
instead. It must be ofcontent-type: application/json
. - Add default
SessionPool
implementation toBasicCrawler
. - Add the ability to create ad-hoc webhooks via
Apify.call()
andApify.callTask()
. - Add an example of form filling with
Puppeteer
. - Add
country
option toApify.getApifyProxyUrl()
. - Add
Apify.utils.puppeteer.saveSnapshot()
helper to quickly save HTML and screenshot of a page. - Add the ability to pass
got
supported options torequestOptions
inCheerioCrawler
thus supporting things such ascookieJar
again. - Switch Puppeteer to web socket again due to suspected
pipe
errors. - Fix an issue where some encodings were not correctly parsed in
CheerioCrawler
. - Fix parsing bad Content-Type headers for
CheerioCrawler
. - Fix custom headers not being correctly applied in
Apify.utils.requestAsBrowser()
. - Fix dataset limits not being correctly applied.
- Fix a race condition in
RequestQueueLocal
. - Fix
RequestList
persistence of downloaded sources in key-value store. - Fix
Apify.utils.puppeteer.blockRequests()
always including default patterns. - Fix inconsistent behavior of
Apify.utils.puppeteer.infiniteScroll()
on some websites. - Fix retry histogram statistics sometimes showing invalid counts.
- Added regexps for Youtube videos (
YOUTUBE_REGEX
,YOUTUBE_REGEX_GLOBAL
) toutils.social
- Added documentation for option
json
in handlePageFunction ofCheerioCrawler
v0.16.1
- Add
useIncognitoPages
option toPuppeteerPool
to enable opening new pages in incognito
browser contexts. This is useful to keep cookies and cache unique for each page. - Added options to load every content type in CheerioCrawler.
There are new optionsbody
andcontentType
inhandlePageFunction
for this purposes. - DEPRECATED: CheerioCrawler
html
option inhandlePageFunction
was replaced withbody
option.
v0.16.0
v0.15.5
v0.15.4
- DEPRECATED:
dataset.delete()
,keyValueStore.delete()
andrequestQueue.delete()
methods have been deprecated in favor of*.drop()
methods, because thedrop
name more clearly communicates the fact that those methods drop / delete the storage itself, not individual elements in the storage. - Added
Apify.utils.requestAsBrowser()
helper function that enables you to make HTTP(S) requests disguising as a browser (Firefox). This may help in overcoming certain anti-scraping and anti-bot protections. - Added
options.gotoTimeoutSecs
toPuppeteerCrawler
to enable easier setting of navigation timeouts. PuppeteerPool
options that were deprecated from thePuppeteerCrawler
constructor were finally removed. Please usemaxOpenPagesPerInstance
,retireInstanceAfterRequestCount
,instanceKillerIntervalSecs
,killInstanceAfterSecs
andproxyUrls
via thepuppeteerPoolOptions
object.- On the Apify Platform a warning will now be printed when using an outdated
apify
package version. Apify.utils.puppeteer.enqueueLinksByClickingElements()
will now print a warning when the nodes it
tries to click become modified (detached from DOM). This is useful to debug unexpected behavior.
v0.15.3
Apify.launchPuppeteer()
now acceptsproxyUrl
with thehttps
,socks4
andsocks5
schemes, as long as it doesn't contain username or password.
This is to fix Issue #420.- Added
desiredConcurrency
option toAutoscaledPool
constructor, removed
unnecessary bound check from the setter property