Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-introduce chunk clustering #6

Open
mgechev opened this issue Apr 24, 2018 · 3 comments
Open

Re-introduce chunk clustering #6

mgechev opened this issue Apr 24, 2018 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@mgechev
Copy link
Member

mgechev commented Apr 24, 2018

Prior implementation

The initial chunk clustering implementation accepts a minimum number of chunks that need to be produced by the build process and goes through a graph clustering based on Tarjan's algorithm for finding strongly connected components.

This algorithm has a number of problems:

  • It treats chunks belonging to nested routes and chunks without a common ancestor the same way.
  • It terminates when the number of chunks in the application is at most the minimum number of chunks specified in the configuration. Configuring the minimum number of chunks the application should consist is not a good heuristic for concatenating JavaScript files together.
  • The algorithm does not consider the chunk size but only the probability two chunks to be visited in the same session.

Suggested implementation

Combine only chunks of nested routes where the probability of two chunks to be used together is high (threshold should be discussed).

In such case, if we have the following routes in the application:

/a/b/c
/b
/c
/d
/e/f
/e

We'll recommend developers to load everything lazily (i.e. even load lazily /a, /a/b, and /a/b/c).

In such case, the probability /a, /a/b, and /a/b/c to be visited in the same session is 1 (because we have only one route with root /a), for /e and /e/f the probability is n.

In this case, we'll cluster the chunks from /a, /a/b, and /a/b/c into a single file. Depending on the value of n, we may also combine /e and /e/f.

When the user is at any page (e.g. /a/b/c) we'll prefetch the other bundles based on the Markov chain which we've built in the prefetch plugin.

Note: this algorithm does not solve the problem which occurs when combining large chunks together.

@addyosmani
Copy link
Contributor

Thanks for writing this up, Minko! For an application without deeply nested routes, I think the above implementation could be reasonable.

Let's attempt to mentally apply this to Shop - an example that would match the level of nesting in your suggestion. Clustering the entry route, a category chunk /list/mens_outerwear and chunk for the product view detail/mens_outerwear/Anvil+L+S+Crew+Neck+-+Grey into a single file for prefetching makes sense to me. Let's imagine the routes are cleaner, but the overall navigation path matches your a, a/b, a/b/c pattern.

Where this becomes more complicated is:

  • In a navigation path a/b/c, what do we do if /b/ or /c/ are relatively large. For arguments sake, let's say a=20KB, b=200KB and c=300KB in size. Prefetching these chunks will improve the load times for future navigations, but also start to creep into size might matter territory. We could augment your proposal with the notion of a maximum clustered chunk size:
maxClusteredChunkSize: 250

For a/b/c, the above configuration could mean a will prefetch a clustered chunk with a maximum size of 250 - b fits into this budget, but c does not. This means we will only cluster/prefetch b but not c. If the budget is updated to maxClusteredSize: 520 or so, a/b/c will end up in the same clustered chunk for prefetching.

  • An application has deeply nested routes. If I'm Walmart.com and a page could be 6-7 levels deep into the navigation (e.g a/b/c/d/e/f), is there a circumstance where we would ever end up clustering a-f into the same chunk? If so, I'd have concerns about size and over-consuming bandwidth.

The concept of maxClusteredChunkSize could help here too or we could intro a way to limit how much nesting is clustered.

@mgechev mgechev self-assigned this May 1, 2018
@mgechev
Copy link
Member Author

mgechev commented May 1, 2018

Another corner case I see is if a child module (C) is used by two parent modules (A and B). In such case, it will be dangerous to combine C with A and B (i.e. form the chunks AC and BC) because this may cause state duplication.

If a user wants to load C lazily from both A and B then when webpack fetches C it'll keep a single instance of C in-memory and share it with A and B. However, if we combine A with C and B with C during the build process, this will keep two instances of C in-memory.

The solution I see is to combine C only with one of its parents (e.g. the one with the highest probability to require C). Another nice feature, allowing users to forbid clustering of specific modules. If the users want to load C lazily in all cases, it might be a good idea to allow them to.

@addyosmani
Copy link
Contributor

The solution I see is to combine C only with one of its parents (e.g. the one with the highest probability to require C).

👍

@addyosmani addyosmani mentioned this issue May 17, 2018
13 tasks
@mgechev mgechev added the enhancement New feature or request label May 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants