You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On MacOS, If you attempt to import pyarrow (or importing pandas when the pyarrow library is installed as well) [edit: or requests library or urllib.request from standard library], within the module where you implement your task as illustrated below:
This will result in an error message produced in the rq worker output, which looks something like this:
18:35:15 default: hello.hello_world (c26c3735-eeb8-442f-a89e-75ecac6f1c92)
objc[34393]: +[NSString initialize] may have been in progress in another thread when fork() was called.
objc[34393]: +[NSString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
18:35:15 Moving job to FailedJobRegistry (Work-horse terminated unexpectedly; waitpid returned 6 (signal 6); )
I tried the same thing on Ubuntu, and the error wasn't reproducible. Therefore, I assume it only happens within MacOS.
One workaround for this issue is to import pyarrow in your worker script prior to the worker forking its process:
Another workaround I found was to run the rq worker as follows:
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES rq worker
Edit:
The issue turns out not to be specific to pyarrow. For instance, on MacOS if you import urllib.request from the standard library on top of the module that implements your RQ jobs, you will encounter the exact same problem.
After further investigation, it turns out that urllib.request internally imports _scproxy—a darwin platform-specific code. Therefore, the root cause of this issue is likely the process forking issue in MacOS.
While this issue could presumably be resolved by using an alternative forking method on the darwin platform in the worker implementation, it would also be beneficial to document it. Thus, current users of RQ on MacOS could be warned, as they may encounter this problem when utilizing a library that depends on native MacOS libraries. This could even be the case when using some parts of the Python standard library.
The text was updated successfully, but these errors were encountered:
admirito
changed the title
RQ Tasks Failed Upon Importing Pandas+Pyarrow
RQ Tasks Failed Upon Importing Pyarrow (on MacOS)
Mar 18, 2024
You might want to cross-post this on the pyarrow repository. The community is very active and helpful. They might also be interested in this.
After further investigation, it was revealed that the issue could be reproduced by modules other than pyarrow. Therefore, I edited the issue and its title to reflect this fact.
On MacOS, If you attempt to import pyarrow (or importing
pandas
when thepyarrow
library is installed as well) [edit: orrequests
library orurllib.request
from standard library], within the module where you implement your task as illustrated below:This will result in an error message produced in the
rq worker
output, which looks something like this:I tried the same thing on Ubuntu, and the error wasn't reproducible. Therefore, I assume it only happens within MacOS.
One workaround for this issue is to import pyarrow in your worker script prior to the worker forking its process:
Another workaround I found was to run the
rq worker
as follows:Edit:
The issue turns out not to be specific to
pyarrow
. For instance, on MacOS if you importurllib.request
from the standard library on top of the module that implements your RQ jobs, you will encounter the exact same problem.After further investigation, it turns out that
urllib.request
internally imports_scproxy
—adarwin
platform-specific code. Therefore, the root cause of this issue is likely the process forking issue in MacOS.While this issue could presumably be resolved by using an alternative forking method on the
darwin
platform in the worker implementation, it would also be beneficial to document it. Thus, current users of RQ on MacOS could be warned, as they may encounter this problem when utilizing a library that depends on native MacOS libraries. This could even be the case when using some parts of the Python standard library.The text was updated successfully, but these errors were encountered: