Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread pools for faster builds? #225

Open
Syonyk opened this issue Jan 9, 2021 · 6 comments · May be fixed by #282 or #283
Open

Thread pools for faster builds? #225

Syonyk opened this issue Jan 9, 2021 · 6 comments · May be fixed by #282 or #283

Comments

@Syonyk
Copy link

Syonyk commented Jan 9, 2021

I really enjoy how jekyll_picture_tag works, but it's quite slow rendering a full site with a lot of images. I've noticed it's purely single threaded in operation, when the picture resizing could easily be done in parallel.

Has there been any consideration of using a thread pool or some other technique to allow for image conversions in parallel? It should speed rendering significantly on a multi-core machine.

I'm not all up to speed with Ruby development, but I could take a stab at it if nobody else has cycles to poke at this.

@Syonyk
Copy link
Author

Syonyk commented Jan 10, 2021

Actually, looking at the flow, I'm not sure this is possible - it looks like jpt is called by Jekyll for each tag, so without multithreading Ruby, it wouldn't be possible to do a document in parallel - just each image. So, perhaps some speedups, but not as much as being able to throw the whole document worth of resizing into a thread pool.

@rbuchberger
Copy link
Owner

Thanks for the ideas! I was considering some sort of multithreading awhile back, but ultimately never put the time into making it work. You're right that Jekyll runs the show, multithreading tags would require some pretty big changes to Jekyll itself. That said, each picture tag generates multiple images, so we might be able to do that in parallel. Image generation is by far the most expensive operation JPT does; everything else is basically free by comparison.

I don't have any experience at all writing multithreaded code, so if you're offering assistance in making this happen I'd certainly appreciate it.

Note that we're also looking at moving from imagemagick to libvips, which should see a significant performance increase as well.

@Syonyk
Copy link
Author

Syonyk commented Jun 6, 2021

I've messed around a little - and I am far from skilled in the Ruby. I'm a low level C guy by trade.

However, I have figured out how to reasonably get threaded builds going, at least for each type of image. I expect one could probably expand this up higher, have threads for each image type and go, but this at least proves the concept a bit.

In srcsets/basic.rb:

     def build_files
        # By 'files', we mean the GeneratedImage class.
        return target_files if target_files.all?(&:exists?)
	files = checked_targets
	threads = []
	# This triggers GeneratedImage to actually build an image file.
	#files.each(&:generate)
	print "Creating threads\n"
	files.each do |n|
		threads << Thread.new do
			n.generate
		end
	end
	threads.each { |thr| thr.join }
	print "Joined threads\n"

        files
      end

This will generate "All the webp files" in parallel, "All the jpgs in parallel," etc. However, it does not generate all the files for the image in general (webp and jpg are sequential). I'm not entirely sure what calls this for each image type.

One might just create a global thread pool and toss the spaghetti at the wall, but my initial attempts at this (just eliminate the thread join) rather rapidly blew up task memory and I don't think it's a welcome enhancement for most people. I'm also far from certain the threads would actually complete prior to the render thread ending.

Anyway, I don't know if this is something you're interested in pursuing further, but the proof of concept definitely indicates it should be doable.

And if you can point me upstream to what calls each srcset generator, I could add some threading there, too.

Rendering images certainly dominates my site build time.

@rbuchberger
Copy link
Owner

I'm a low level C guy by trade.

Cool :)

Regarding the code, I believe what you've written would actually generate all of the widths for a particular srcset in parallel. Each srcset will have files of all the same format, the only difference will be their sizes.

The image generation logic somewhat follows the output markup; the whole party is kicked off by instantiating the correct output format (class) and calling to_s (to string) on it. Each class involved instantiates what it needs, which is why the srcset class is directing image generation. Since <picture> tags are the only tags which can contain multiple <source> tags, that's where the srcsets are instantiated. (Source tags are simple enough they don't need their own class)

I like where you're going with this. I think we could move the thread pool up in scope, something like PictureTag.threads callable from anywhere, and whenever we need to generate an image we can hand it off to a thread. Then we can join all the threads at the very end, before returning the final markup.

The one hitch is that <img> tags must have a src, which we also generate an image for, and which might have the same width as one of the images in the srcset. We'll have to check for that to make sure we don't try to build the same image twice.

@Syonyk
Copy link
Author

Syonyk commented Jun 6, 2021

Regarding the code, I believe what you've written would actually generate all of the widths for a particular srcset in parallel. Each srcset will have files of all the same format, the only difference will be their sizes.

Correct. It generates all the webp in parallel, then all the jpg in parallel, etc.

Anyway, I really don't know Ruby well enough to do much more than what I've done, which at least helps my use cases for my renders (I generally don't rerender a ton, but some of my posts are photo heavy). If it gets done, it would be awesome, but doing global thread pools and such is well beyond my experience level with Ruby.

@rbuchberger
Copy link
Owner

Thanks for what you've figured out so far. We'll take another crack at it.

aebrahim added a commit to aebrahim/jekyll_picture_tag that referenced this issue Sep 21, 2022
@aebrahim aebrahim linked a pull request Sep 21, 2022 that will close this issue
aebrahim added a commit to aebrahim/jekyll_picture_tag that referenced this issue Sep 22, 2022
Fixes rbuchberger#225

This implements a global Concurrent::ThreadPoolExecutor from
concurrent-ruby to avoid memory blowup, and de-duplicates files before
generation so we no longer need to rely on filesystem consistency to
avoid double-generating images.

This is an alternate to rbuchberger#282 with a litle bit more complexity, but with
the added benefits that all image generation can happen in a single
ThreadPool.
@aebrahim aebrahim linked a pull request Sep 22, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants