Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support workerData as a pool option for worker threads? #65

Open
ianwalter opened this issue Apr 16, 2019 · 9 comments
Open

Support workerData as a pool option for worker threads? #65

ianwalter opened this issue Apr 16, 2019 · 9 comments

Comments

@ianwalter
Copy link

ianwalter commented Apr 16, 2019

It would be cool to have a way to make general data available on worker initialization instead of just through worker function parameters. I can submit a PR.

Ref: https://nodejs.org/api/worker_threads.html#worker_threads_worker_workerdata

@josdejong
Copy link
Owner

Thanks for your suggestion @ianwalter. Do you have a concrete use case for this idea?

It could be interesting, though I would also love to keep the API the same for web workers, child_process, and worker threads, so let's think this through.

@ianwalter
Copy link
Author

@josdejong The use case that made me think about this was wanting to set a log level for all of the workers to use based on the configuration parsed by the main thread.

I can understand wanting to keep the API the same but not sure how to do that or if it would be worth it. For my project, I'm all in on worker threads and not trying to support the child process fallback.

@josdejong
Copy link
Owner

Thanks for your explanation. Something like setting a log level could be relevant for browsers too, so maybe we can dig a bit deeper and think about a solution that will work in any environment.

Thinking aloud here: Maybe a hook like onWorkerCreated which allows you to perform some action after a worker is created, like in your case invoke a method setLogLevel on the worker or something like that.

@sbrl
Copy link

sbrl commented Apr 5, 2020

Hey there! This would be a really useful feature to have.

A workercreated event (e.g. pool.on("workercreated")) would be cool, but without a way to execute a function on specifically the new worker that was just created, it would be of limited help.

I suggest allowing an extra state object to be passed in when the pool is created. For example:

// master
let pool = workerpool.pool({
    // Other options go here
     state_info: { foo: 5, bar: "some_string" }
});
// worker
console.log("I have state info:", workerpool.state_info);

Context: For my PhD I am handling a large dataset. I need to parallelise the processing thereof, but to process it I need to pass in a reference 2D array that's complicated and potentially computationally expensive to initialise and create. To this end, I want to initialise it once on the master, and then pass it to all workers via (immutable) shared state

@josdejong
Copy link
Owner

Thanks for your input, that is a very simple and elegant approach @sbrl !

I have to double check if it's possible to expose the state directly as a property workerpool.state_info (or simply workerpool.state), or that we need a getter for it like workerpool.getState().

@sbrl
Copy link

sbrl commented Apr 21, 2020

Thanks, @josdejong! Either would be great if possible.

I think it's probably a good thing to encourage immutable shared state in particular, since lots of bugs can arise from having mutable shared state that's modified by multiple processes at the same time.

@josdejong
Copy link
Owner

I think it's probably a good thing to encourage immutable shared state in particular, since lots of bugs can arise from having mutable shared state that's modified by multiple processes at the same time.

Totally agree!

Anyone interested in implementing this feature?

@adrfantini
Copy link

I saw that there are two available undocumented options forkArgs and forkOpts, both passed to child_process.fork when the worker is created.
Maybe these can be used to pass simple string data to the workers? As I understand it could be just an horrible workaround.

@sbrl
Copy link

sbrl commented Jan 22, 2021

pass simple string data

If only a string can be passed, I recommend automatic serialisation to JSON (though IIRC when sending an object to a different process/thread in Node it will serialise automatically; with potentially higher performance but I haven't tested that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants