Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high bandwidth usage in monorepo #1351

Open
3 tasks done
ElPrudi opened this issue Nov 18, 2023 · 24 comments
Open
3 tasks done

Very high bandwidth usage in monorepo #1351

ElPrudi opened this issue Nov 18, 2023 · 24 comments
Labels

Comments

@ElPrudi
Copy link

ElPrudi commented Nov 18, 2023

  • I have searched for similar issues
  • I am using the latest version of npm-check-updates
  • I am using node >= 14.14

Steps to Reproduce

.ncurc:

{
    "format": "group",
    "root": ".",
    "upgrade": true,
    "workspaces": true
}

Steps:

I've have a monorepo that has a main react package and some utility packages with the sames dependencies: @types/node, eslint, vitest and typescript.

I used AppNetworkCounter to monitor the bandwidth usage.

Current Behavior

My guess is, that this script re-checks every dependency even if it was already checked before. That is, if I have 4 packages in my monorepo each with @types/node, it checks it four times instead of caching it. As I wanted to update my monorepo, the bandwidth shot in the air with like 3MB/s and 1 or 2MB of requests and file reading for each dependency.

Expected Behavior

  • Find a way to reduce the needed bandwidth for each dependency
  • Cache duplicate packages
@raineorshine
Copy link
Owner

raineorshine commented Nov 19, 2023

Hi, thanks for reporting.

In a single run of npm-check-updates results are memoized by dependency name. So in a monorepo you should see only one request per dependency, even if it is used in many packages.

Note that you can enable caching across runs with the --cache option. The default behavior is to fetch new dependencies every time. This prioritizes accuracy over performance.

Cache versions to a local cache file. Default --cacheFile is ~/.ncu-cache.json and default --cacheExpiration is 10 minutes.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Ah, thank you very much. My bad for not reading the docs. The thing is: I have packages with the exact same dependencies and it was still checking and requesting for each one.

@raineorshine
Copy link
Owner

raineorshine commented Nov 19, 2023

Sorry I misunderstood initially (edited). If you are seeing multiple requests for the same dependency, then we should investigate as that is not the expected behavior. It should memoize the results of every dependency across monorepo packages.

If you have a simple reproduce case that would be helpful. If not, I'll do a test run in workspaces mode and print some logs to see if I can reproduce it.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

I can tell you the stats of mine after fetching 90 dependencies:

Received Bytes: 80.039.784
Sent Bytes: 35.658
Receive Speed: Roughly 2.7 MB/s
Received Packets: 51.921
Sent Packets: 103

The sent packets does roughly match the amount of dependencies I have, so there must be a problem with the amount of data these requests hold. I guess, a simple vite create command and creating a simple mono repo with 4 packages with these dependencies should be enough to trigger this:

{
    "devDependencies": {
        "@types/node": "^20.9.2",
        "@vitest/coverage-v8": "^0.34.6",
        "@vitest/ui": "^0.34.6",
        "eslint": "^8.54.0",
        "tslib": "^2.6.2",
        "typescript": "^5.2.2",
        "vitest": "^0.34.6"
    }
}

I can't currently create a simple repro for that, sorry. I try to make one as fast as possible later on.

@raineorshine
Copy link
Owner

Great, thanks. Can you also send the stats for ncu typescript? That will tell us how many bytes are in a single request.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Yup, here are the stats:

Received Bytes: 7.069.026
Send Bytes: 817
Receive Speed: Roughly 2.7 MB/s
Received Packets: 2.103
Send Packets: 3

It's basically downloading the entire package from the release tab.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Also, yes, cache does help. Previously I had to cancel the ncu process, but now I tested it on a monorepo with way more packages:

Received Bytes: 160.470.651
Send Bytes: 69.258
Receive Speed: Roughly 6.5 MB/s
Received Packets: 43.501
Sent Packets: 201

With cache on, it's much less:

Received Bytes: 33.774.245
Sent Bytes: 36.418
Receive Speed: Roughly 3MB/s
Recevied Packets: 11.606
Sent Packets: 107

@raineorshine
Copy link
Owner

It's basically downloading the entire package from the release tab.

I'm not sure what you mean by "release tab", but npm-check-updates does download the entire package manifest for each dependency. It seems possible to just fetch the dist-tags, which would be way less bandwidth, though we would lose the deprecation check (deprecated versions are currently excluded from ncu results by default).

With cache on, it's much less

Great, thanks. So in that case it could be a bug in the memoization. I will look into it.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

I was refering to this tab.

Wouldn't it be possible to just read the first lines of a README or package.json or other manifest files? Is it possible to just request the first few chunks of the whole manifest file until a package version is found? Because 7.4 MB for just Typescript is way too much if you can just read it from smaller files of the repo.

@raineorshine
Copy link
Owner

Wouldn't it be possible to just read the first lines of a README or package.json or other manifest files?

package.json is the only manifest file. The README does not typically contain the latest version (if it appears on a README, it's usually in the form of an SVG badge that is cached from the npm registry).

Is it possible to just request the first few chunks of the whole manifest file until a package version is found?

That's a great idea to stream it. Once #1329 gets merged, we should be able to use fetch.json.stream.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Still its weird that the received packets are 7MB in total when the entire package.json file of typescript is just 3.44 KB. There must be a lot of unnecessary data fetched as well.

@raineorshine
Copy link
Owner

Yes, good point. npm-check-updates currently uses pacote.packument which gets all manifests of all versions. That seems... bad 😅

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Ah, so it essentially boils down to a faulty dependency. Well, I hope the new dependency reduces the bandwidth usage a lot. At least this issue gave us a little insight about this problem too.

If you finally got this to work, try testing the bandwidth usage with AppNetworkCounter. It's free and doesn't even need an installation as it runs out of the box.

I leave this issue open until the bandwidth is reduced in ncu 17.

@raineorshine
Copy link
Owner

Basically yes. The use of other --target values requires getting all published versions, but the default --target latest should definitely not need them.

So there are three areas where the performance can be improved:

  1. Fix the bug that is causing duplicate packages to not be memoized correctly in workspaces
  2. Use pacote.manifest instead of pacote.packument.
  3. Use streaming to get the latest version in the minimum amount of bytes.

raineorshine added a commit that referenced this issue Nov 19, 2023
).

This was preventing memoization of duplicate packages in workspaces.
@raineorshine
Copy link
Owner

Let's start with the easy one. I've fixed #1 and published to v16.14.7. It will properly memoize the pacote call and only fetch each packument once now in workspace mode.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Ok, this small change made this tool MUCH faster and reduced the bandwidth usage by a lot. Basically cache: true and cache: false are behaving the same now in default settings.

So the only issue is still the fetching of each dependency. ncu typescript still has roughly the same stats like before.

@raineorshine
Copy link
Owner

Great, I'm glad that's fixed.

I replaced pacote.packument with pacote.manifest when fetching a tag (e.g. --target latest, the default). Published to v16.14.8. Try it out and let me know if you see any reduction in bandwidth.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Only the performance changed. TypeScript still costs 7MB and my repository still costs 33MB. The only thing that got reduced was the received packets with ncu typescript from 2K to 678.

@raineorshine
Copy link
Owner

Hmmm. I would have expected that to change. I'll try it on a couple more test cases.

@raineorshine
Copy link
Owner

There was a small error in my code that was preventing the manifest case.

Please try v16.14.9 when you get a chance.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Now, this made it worse:

ncu typescript:

Received Bytes: 14.133.320
Sent Bytes: 1.165
Received Packets: 4.324
Sent Packets: 4

My repo:

Received Bytes: 67.151.019
Sent Bytes: 68.388
Received Packets: 29.317
Sent Packets: 196

You essentially doubled the downloading and also the requests got more. I guess instead of replacing, you just added it on top.

@raineorshine
Copy link
Owner

Turns out it was a mistake with the arguments passed to pacote.manifest. It caused an empty response, which triggered a fallback fetch that passed the functional tests.

Fixed in v16.14.10. (i.e. back to v16.14.7)

As for the download size, it turns out that pacote fetches the full packument even when calling pacote.manifest.

https://github.com/npm/pacote/blob/9e9e18761dc2091f0030a9a4758a47ee3d8b11a4/lib/registry.js#L118

Disappointing, but it fits my understanding that the npm registry api does not have an additional endpoint for just the manifest. The real solution will be an abortable stream.

@ElPrudi
Copy link
Author

ElPrudi commented Nov 19, 2023

Damn, but thanks for trying. Well, you managed to help me a lot with my monorepo, so I'm gonna wait for ncu 17 when you implemented the streams approach.

@raineorshine
Copy link
Owner

Thanks for taking the time to report the issue. I learned some things!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants