Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-Threaded? #65

Open
rainabba opened this issue Oct 25, 2017 · 5 comments
Open

Single-Threaded? #65

rainabba opened this issue Oct 25, 2017 · 5 comments

Comments

@rainabba
Copy link

A few days ago I discovered that many of my transformations using stylesheet.apply() were not completing in a reasonable period of time and it looks as if only a single process is being used. Is that expected behavior? Is there a way to get multi-threaded behavior? Any idea of what the practical limits are in terms of size/nodes (in case I'm somewhere near that)?

@petershaw
Copy link

Isn't node always single threaded? even a c lib called throw libuv is single threaded when the parent app is not hosting threads. Or am I wrong?

@rainabba
Copy link
Author

Node is, but external binaries (like this) need not be at all and even if they return sync, internally they can still use many threads to get their work done. This is how OoenCV and GraphicsMagic both behave in their respective packages.

@albanm
Copy link
Owner

albanm commented Aug 25, 2018

If you use the async API of this lib, the application of the stylesheet is performed in a nan/asyncworker.. I don't have a very clear idea of what is going on, but my understanding is that the task is executed in a thread from a pool of worker threads managed by libuv. So yes in my mind there was multithreading of transformations in the async mode. But I didn't got far in checking that aspect. And unfortunately, the parsing (both of the stylesheet and the input document) is done in the main thread, not in a worker.

FYI I don't actively maintain this module anymore. I simply try to check the easy stuff, help people from time to time, and accept pull requests.

@rainabba
Copy link
Author

@albanm Thanks for the reply. "async API of this lib" would be the following right? If so, my overall CPU load is ~40% (quad core, with plenty else running) which is also what I'd expect for a single core. I am now running inside Docker (Ubuntu/WSL), but as I understand it, my container shouldn't be CPU bound as I've allowed 6 cores for Docker (it's quad-hyperthreaded) so I'd expect to see at least 75% CPU busy if this was multi-threaded. Of course, I'm assuming a few things about it based on modern browser element parsing (since it's the same) and I've been working with XSLT at-scale for more than a decade so I've got a lot of experience/intuition, but not the knowledge to be sure here; just educated guesses that keep getting better with time :)

My data/template takes (in this case) 360 entities across 20 different groupings. Each grouping is applied to the same set of data (having ~25,000 elements, matching only the data from that set) and that subset is applied to another template to get the final output (ultimately a full page of HTML/CSS that I then turn into a PDF using WKHTMLTOPDF).

In the past, I was using MS SQLXML 3.x (OLD ISAPI filter that basically provided an HTTP endpoint that was bound to SQL and an XSLT, then returned the result of that stylesheet applied to the results from the SQL. As a parser, it was FAR faster (I can still run these on a legacy setup), but I don't want to deal with IIS anymore or even x86 .DLL ISAPI filters (I could use it outside IIS now).

Today I'm coming to find that anything taking more than 800 seconds is also dying so I'm looking for timeout options today.

libxslt.parse( xsl , (err, stylesheet) => {
  stylesheet.apply( ..., ( err, result ) {
    //While this work is being done, I don't see the CPU being maxed on more than 1 core. 
  });
});

@albanm
Copy link
Owner

albanm commented Sep 25, 2018

Yes, that is the async API. If you don't see a higher CPU usage using this mode, then I guess you still have the option of implementing a higher level concurrency management (node cluster maybe).

As for timeout options, I don't remember.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants