Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a global threadpool for faster builds. #283

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aebrahim
Copy link

Fixes #225

This implements a global Concurrent::ThreadPoolExecutor from concurrent-ruby to avoid memory blowup, and de-duplicates files before generation so we no longer need to rely on filesystem consistency to avoid double-generating images.

This is an alternate to #282 with a little bit more complexity, but with the added benefits that all image generation can happen in a single ThreadPoolExecutor.

Fixes rbuchberger#225

This implements a global Concurrent::ThreadPoolExecutor from
concurrent-ruby to avoid memory blowup, and de-duplicates files before
generation so we no longer need to rely on filesystem consistency to
avoid double-generating images.

This is an alternate to rbuchberger#282 with a litle bit more complexity, but with
the added benefits that all image generation can happen in a single
ThreadPool.
@rbuchberger
Copy link
Owner

This is amazing work! It's not so complicated considering the benefits it provides. I like this version better, though it might be worth testing a few builds to make sure it's faster by enough to justify the added complexity.

I'm still on a work trip, get home Saturday, so the earliest I can pull this down and test it will be monday. Just from reading, my initial criticism is I don't super like the test_pool because I'd like the tests and production to be as similar as possilbe, and I'd like to minimize the amount of code in production which deals with testing. I haven't tried to find another way, so it might just be our only option.

The tests on this project are a bit of a hot mess (sorry about that, I had learning to do), but I'd like to see a few tests of this behavior as well. I can handle that part if you're not up for it. We should also add a configuration option to disable it, in case someone has issues. (This code gets run in all sorts of weird environments)

Again, thank you so much! this is as big of an improvement as switching from imagemagick to libvips, which was huge.

Instead, we add a wait method, and ensure unit tests call wait before
checking for expectations. This involved a litle bit of refactoring to
have all tests call wait in the before_teardown hook.
@aebrahim
Copy link
Author

Thanks for the kind words @rbuchberger! And thanks for this package - it's so useful!

I refactored the tests so I could remove start_test_pool - please let me know what you think when you get back. It's been a fun way to learn a little bit of ruby :P

@Syonyk
Copy link

Syonyk commented Sep 25, 2022

If I could request one quick improvement, as someone who's dorked around with threading this in the past for performance (your method is better than mine, for sure!):

Can the number of parallel threads be configurable in the config files somewhere? Or, if it already is, document how to adjust it. I use some systems that are high on CPU count to RAM ratios (one of my little ARM boxes is 6C/4GB), and some of the resizes can use enough RAM to really choke the rest of the system out. Thanks!

@aebrahim aebrahim marked this pull request as draft September 27, 2022 00:28
@aebrahim
Copy link
Author

This approach does not seem to actually work - it still creates one threadpool per tag. Back to the drawing board for a bit to make it a full global

@rbuchberger
Copy link
Owner

That's right - every {% picture mypicture.jpg %} called in a site creates a unique instance of JPT.

There's a context object that is handed around during the site build. You might need to store relevant information there, and use some hooks to ensure that threads are cleaned up.

@aebrahim
Copy link
Author

I the code working by changing start_pool to ensure_pool_started and stopping the pool with a hook:

Jekyll::Hooks.register :site, :post_render do
  PictureTag::Pool.stop_pool
end

However, I'm having a lot of trouble testing it like that. The hardest part for me has been figuring out how to stub an option globally - I added a threads option, but I'm not sure of a good way to consistently read it in test without race conditions.

@rbuchberger
Copy link
Owner

rbuchberger commented Sep 29, 2022

@aebrahim - We're out of my area of experience here - I've never played with multithreaded Ruby code. Closest I've gotten is asynchronous javascript.

Our tests are synchronous, so would calling ensure_pool_started during setup and stop_pool during teardown work? For tests which verify file presence, we'd have to call stop pool before looking for them, but there aren't a huge number of those.

Edit: I commented before reading your new commit, looks like that's pretty much what you're doing. Hmm, I'll have to do some learning here

@aebrahim
Copy link
Author

I suspect most of my problems are coming from trying to read parameters. Maybe I will update this to remove all the parameter reading code and see if I can get tests to pass that way, and then leave it for someone else to figure out how to read parameters? WDYT?

@rbuchberger
Copy link
Owner

Hmm... Not sure I follow - which parameters do you mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thread pools for faster builds?
3 participants