Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring process time for huge collection #161

Open
CharlesNepote opened this issue Jun 29, 2022 · 2 comments
Open

Monitoring process time for huge collection #161

CharlesNepote opened this issue Jun 29, 2022 · 2 comments
Labels

Comments

@CharlesNepote
Copy link

Tailing the log is great for this. Mongo provides a "percent complete" measurement for you. These operations can take a long time on huge collections.

Applied on our database, the duration is 75 minutes for 100,000 entries (see below) but there are 2,400,000+ in our whole database.

time mongo off --eval "var collection = 'products', limit = 100000" variety.js > off_schema_100000.txt
# (75 minutes)

I don't know if it's possible to speed up this, but at least, I would love to look at some kind of "percent complete" measurement.

I know it's not related to variety, but could you document how you do this in the readme.md? I did not find how to this in Mongo logs.

The script is running since more than two days on my database.

By the way, thanks a lot for this nice and useful tool.

@CharlesNepote
Copy link
Author

It's open data, so you can reproduce if you want to test variety on a huge Mongo database. See this quick and dirty "howto": https://wiki.openfoodfacts.org/Reusing_Open_Food_Facts_Data#MongoDB_dump

@JamesCropcho
Copy link
Member

Hello, Charles:

Thank you for using Variety, and on behalf of everyone involved than for your kind words!

Indeed, what you reference is there in the documentation (https://github.com/variety/variety#see-progress-when-analysis-takes-a-long-time).

I'm not sure why you are not getting "percent complete." Perhaps newer versions of the shell/logger do not provide this—it is one hypothesis.

Maybe it would be useful if you increased the verbosity of logging for your MongoDB service? I wish I could be of more help.

Perhaps someone else here knows?

Good Luck,

James
Creator of Variety

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants