Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find scrapy.cfg file to infer project data dir #200

Open
sulthonzh opened this issue Aug 20, 2019 · 1 comment
Open

Unable to find scrapy.cfg file to infer project data dir #200

sulthonzh opened this issue Aug 20, 2019 · 1 comment

Comments

@sulthonzh
Copy link
Contributor

sulthonzh commented Aug 20, 2019

image
i got error message like this when deployed scrapy project to scrapyd, even when scrapy.cfg is included in the egg file

I have deployed a scrapy project to scrapyd, but I think there is a problem with the spidermon, because without scrapyd it's fine

@rvandam
Copy link

rvandam commented Dec 6, 2023

Ran into this problem as well and wanted to document my findings.

This appears to be due to a series of flawed assumptions between spidermon, scrapy and scrapyd. Spidermon's LocalStorageStatsHistoryCollector uses the data_path method from scrapy.utils.project to try to create a path to store stats history. But data_path requires you to have a scrapy.cfg file somewhere in your working directory or higher. But if you deploy via scrapyd-deploy then your local scrapy.cfg is never copied to the server (not even inside the deployed egg file). And so then scrapy barfs and spidermon doesn't gracefully handle it and kills your spider (see screenshot above).

Only workaround I've found is to add a dummy scrapy.cfg into your working directory (kudos to a suggestion in a related scrapy issue from 8 years ago scrapy/scrapy#1581 (comment) ).

If you want the stats history to be stored somewhere else it appears you can use the completely undocumented datadir section in your otherwise dummy scrapy.cfg (the one on your server, not the one in your project which doesn't get deployed).

[datadir]
default = /path/to/somewhere/

You might alternatively be able to deploy your project's scrapy.cfg by modifying the setup.py that scrapyd-deploy generates. I have not tried that approach.

Perhaps spidermon should use a different, less obscure mechanism for choosing a data path? or at the very least degrade more gracefully by disabling stats history and logging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants