You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[twisted] CRITICAL Unhandled error in Deferred:
Traceback (most recent call last):
File "/Users/honza/.local/share/mise/installs/python/3.11/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/twisted/internet/asyncioreactor.py", line 271, in _onTimer
self.runUntilCurrent()
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/twisted/internet/base.py", line 994, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/twisted/internet/task.py", line 680, in _tick
taskObj._oneWorkUnit()
--- <exception caught here> ---
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/twisted/internet/task.py", line 526, in _oneWorkUnit
result = next(self._iterator)
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/scrapy/utils/defer.py", line 102, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/scrapy/core/scraper.py", line 298, in _process_spidermw_output
self.crawler.engine.crawl(request=output)
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/scrapy/core/engine.py", line 290, in crawl
self._schedule_request(request, self.spider)
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/scrapy/core/engine.py", line 297, in _schedule_request
if not self.slot.scheduler.enqueue_request(request): # type: ignore[union-attr]
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/apify/scrapy/scheduler.py", line 87, in enqueue_request
apify_request = to_apify_request(request, spider=self.spider)
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/apify/scrapy/requests.py", line 76, in to_apify_request
scrapy_request_dict_encoded = codecs.encode(pickle.dumps(scrapy_request_dict), 'base64').decode()
File "/Users/honza/Projects/juniorguru-plucker/.venv/lib/python3.11/site-packages/parsel/selector.py", line 532, in __getstate__
raise TypeError("can't pickle Selector objects")
builtins.TypeError: can't pickle Selector objects
When debugging the problem, I figured out the following line causes the problem:
The <200 https://example.com/.../> is a representation of the Response, which probably cannot be pickled, or at least some Selector objects in there.
I don't think you can do much about it, it's probably a limitation of delegating the request mechanics to an external system such as Apify. If you need to serialize and later deserialize the request, there's just no way I could pass around something which Python cannot pickle.
So I think the only solution here is to fail nicely. The line which pickles the request should catch the exception and provide a nicer error message which explains what is happening and why, ideally with some guidance on how to avoid the problem. I'll get back here if I come up with a workaround.
The text was updated successfully, but these errors were encountered:
honzajavorek
changed the title
Twisted/CRITICAL: builtins.TypeError: can't pickle Selector objects
Twisted/CRITICAL: builtins.TypeError: can't pickle Selector objects (Scrapy)
Mar 5, 2024
Thank you @honzajavorek for reporting this. I've opened a PR #191 which should improve the error handling in to_apify_request. Also, the ApifyScheduler should let the user know, that the request was not scheduled due to this reason.
My spider https://github.com/juniorguru/plucker/blob/26d1758e310b8b2451541516cf4447e4a5e4a11a/juniorguru_plucker/jobs_jobscz/spider.py runs just fine with Scrapy, but fails with critical errors when teaming up with Apify.
See exception details 💌
When debugging the problem, I figured out the following line causes the problem:
Inspecting problematic dicts, the culprit seems to be the fact that I pass a response object around:
Then the response comes in the dict like this:
The
<200 https://example.com/.../>
is a representation of the Response, which probably cannot be pickled, or at least some Selector objects in there.I don't think you can do much about it, it's probably a limitation of delegating the request mechanics to an external system such as Apify. If you need to serialize and later deserialize the request, there's just no way I could pass around something which Python cannot pickle.
So I think the only solution here is to fail nicely. The line which pickles the request should catch the exception and provide a nicer error message which explains what is happening and why, ideally with some guidance on how to avoid the problem. I'll get back here if I come up with a workaround.
The text was updated successfully, but these errors were encountered: