Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concurrency option #69

Merged
merged 6 commits into from Aug 3, 2019
Merged

Add concurrency option #69

merged 6 commits into from Aug 3, 2019

Conversation

threequartersjohn
Copy link
Contributor

@threequartersjohn threequartersjohn commented Jul 31, 2019

Fixes #34.

Switches Promise.All for p-all, and adds concurrency option with suggested default value.

Copy link

@hugomrdias hugomrdias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the PR title to "Add concurrency option" ?

I would suggest adding a smoke test with options.concurrency explicitly set just to make sure future changes don't break the p-all behaviour.

Everything else LGTM.

index.js Outdated
@@ -31,6 +33,13 @@ const preprocessDestinationPath = (source, destination, options) => {

module.exports = (source, destination, options = {}) => {
const progressEmitter = new EventEmitter();
let concurrencyDegree;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you create a new var to make options.concurrency default to (os.cpus().length || 1) * 2 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the new variable to make it explicit in reading what the value meant, do you think it's unnecessary? I think having it like this also makes it easier to find the default value. I could make it so no variable is added, or changing it such that the default value is abstracted to a variable and pass it with {concurrency: options.concurrency || defaultConcurrency} instead, my only concern was the readability of the code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @hugomrdias is saying (and giving you a hint) is that you can simply the code. For example, you may leverage object spread to have default options being mixed in with the ones the user supplies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure what you mean with object spread, since this code change operates with only one of the options. The other options are handled in other functions.

I could default it like:
module.exports = (source, destination, options = {concurrency: (os.cpus().length || 1 * 2}) => [...]
...which also makes the default value easy to find. I'm not sure if this would work. Do you think that would be better?

Copy link

@satazor satazor Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right in the sense that each option is currently being handled in each function, so perhaps lets keep the consistency.

Still, we may simplify with:

    const concurrency = options.concurrency || (os.cpus().length || 1) * 2;

Also, by renaming the variable to concurrency from concurrencyDegree, you may use it directly when passing it to pAll.

Other than that, the PR looks good. However, I suggest adding a simple test (smoke test) to see if the concurrency is being respected.

@threequartersjohn threequartersjohn changed the title Add currency option Add concurrency option Aug 1, 2019
@threequartersjohn
Copy link
Contributor Author

About the test, i'm unsure how to proceed. I could make the function return the concurrency value and test that, but i feel like changing the return is beyond the scope of this PR. Otherwise i'm not sure how concurrency could be tested. Maybe by timing the operation, as in p-map/test.js, but that seems highly variable.

index.d.ts Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
@sindresorhus sindresorhus changed the title Add concurrency option Add concurrency option Aug 2, 2019
@satazor
Copy link

satazor commented Aug 2, 2019

About the test, i'm unsure how to proceed. I could make the function return the concurrency value and test that, but i feel like changing the return is beyond the scope of this PR. Otherwise i'm not sure how concurrency could be tested. Maybe by timing the operation, as in p-map/test.js, but that seems highly variable.

Yep it's hard to test in this case... if we were using jest we could do a stub test instead, by checking if pAll was called with the correct concurrency or checking if cpy does run in parallel.

@sindresorhus
Copy link
Owner

Yep it's hard to test in this case...

I don't think it needs testing. We should trust that p-all does its thing, and it's tested.

if we were using jest we could do a stub test instead

You can stub with AVA too. You just have to bring your favorite stubbing library.

@sindresorhus sindresorhus merged commit 4e282cc into sindresorhus:master Aug 3, 2019
@sindresorhus
Copy link
Owner

Looks good :)

@stroncium
Copy link
Contributor

stroncium commented Sep 20, 2019

Regarding default value for concurrency: it is somewhat unexpectedly based on completely unrelated number of cpus in system. It doesn't make any sense, and (though I didn't test it) I'd expect it to slow things down considerably for modern systems(on the order of hundred times for small files on SSD on system with 2-4 cores).

Ideal default value should be based on kernel defined number of open descriptors and "pipeline width"(which actually doesn't exist as a single number, as it depends on type of operations performed) of all devices file systems we work with are on.

The point of using concurrency is to reduce fd pressure on system, and also reduce memory usage, but typical values we'd get with this default(2-128 range, with 2 for micro VMs and 128 for huge servers) are far below values which will apply serious pressure(compared to pressure introduced by nodejs itself), yet usually far below the values which will saturate io(probably in range of 64 - 1024 for similar systems).

So, I'd advise to just set it to some relatively high constant number(my guess before was 1024), which will both saturate IO for almost all systems, yet won't be too taxing on fds/mem.

Just to amplify all that: basing IO concurrency on number of CPUs is complete bullshit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Copy in serial?
5 participants