Re-introduce chunk clustering #6

mgechev · 2018-04-24T02:07:35Z

Prior implementation

The initial chunk clustering implementation accepts a minimum number of chunks that need to be produced by the build process and goes through a graph clustering based on Tarjan's algorithm for finding strongly connected components.

This algorithm has a number of problems:

It treats chunks belonging to nested routes and chunks without a common ancestor the same way.
It terminates when the number of chunks in the application is at most the minimum number of chunks specified in the configuration. Configuring the minimum number of chunks the application should consist is not a good heuristic for concatenating JavaScript files together.
The algorithm does not consider the chunk size but only the probability two chunks to be visited in the same session.

Suggested implementation

Combine only chunks of nested routes where the probability of two chunks to be used together is high (threshold should be discussed).

In such case, if we have the following routes in the application:

/a/b/c
/b
/c
/d
/e/f
/e

We'll recommend developers to load everything lazily (i.e. even load lazily /a, /a/b, and /a/b/c).

In such case, the probability /a, /a/b, and /a/b/c to be visited in the same session is 1 (because we have only one route with root /a), for /e and /e/f the probability is n.

In this case, we'll cluster the chunks from /a, /a/b, and /a/b/c into a single file. Depending on the value of n, we may also combine /e and /e/f.

When the user is at any page (e.g. /a/b/c) we'll prefetch the other bundles based on the Markov chain which we've built in the prefetch plugin.

Note: this algorithm does not solve the problem which occurs when combining large chunks together.

The text was updated successfully, but these errors were encountered:

addyosmani · 2018-04-29T19:52:53Z

Thanks for writing this up, Minko! For an application without deeply nested routes, I think the above implementation could be reasonable.

Let's attempt to mentally apply this to Shop - an example that would match the level of nesting in your suggestion. Clustering the entry route, a category chunk /list/mens_outerwear and chunk for the product view detail/mens_outerwear/Anvil+L+S+Crew+Neck+-+Grey into a single file for prefetching makes sense to me. Let's imagine the routes are cleaner, but the overall navigation path matches your a, a/b, a/b/c pattern.

Where this becomes more complicated is:

In a navigation path a/b/c, what do we do if /b/ or /c/ are relatively large. For arguments sake, let's say a=20KB, b=200KB and c=300KB in size. Prefetching these chunks will improve the load times for future navigations, but also start to creep into size might matter territory. We could augment your proposal with the notion of a maximum clustered chunk size:

maxClusteredChunkSize: 250

For a/b/c, the above configuration could mean a will prefetch a clustered chunk with a maximum size of 250 - b fits into this budget, but c does not. This means we will only cluster/prefetch b but not c. If the budget is updated to maxClusteredSize: 520 or so, a/b/c will end up in the same clustered chunk for prefetching.

An application has deeply nested routes. If I'm Walmart.com and a page could be 6-7 levels deep into the navigation (e.g a/b/c/d/e/f), is there a circumstance where we would ever end up clustering a-f into the same chunk? If so, I'd have concerns about size and over-consuming bandwidth.

The concept of maxClusteredChunkSize could help here too or we could intro a way to limit how much nesting is clustered.

mgechev · 2018-05-01T03:27:35Z

Another corner case I see is if a child module (C) is used by two parent modules (A and B). In such case, it will be dangerous to combine C with A and B (i.e. form the chunks AC and BC) because this may cause state duplication.

If a user wants to load C lazily from both A and B then when webpack fetches C it'll keep a single instance of C in-memory and share it with A and B. However, if we combine A with C and B with C during the build process, this will keep two instances of C in-memory.

The solution I see is to combine C only with one of its parents (e.g. the one with the highest probability to require C). Another nice feature, allowing users to forbid clustering of specific modules. If the users want to load C lazily in all cases, it might be a good idea to allow them to.

addyosmani · 2018-05-01T14:59:38Z

The solution I see is to combine C only with one of its parents (e.g. the one with the highest probability to require C).

👍

mgechev self-assigned this May 1, 2018

addyosmani mentioned this issue May 17, 2018

Roadmap proposals #50

Closed

13 tasks

mgechev added the enhancement New feature or request label May 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-introduce chunk clustering #6

Re-introduce chunk clustering #6

mgechev commented Apr 24, 2018 •

edited

addyosmani commented Apr 29, 2018

mgechev commented May 1, 2018 •

edited

addyosmani commented May 1, 2018

Re-introduce chunk clustering #6

Re-introduce chunk clustering #6

Comments

mgechev commented Apr 24, 2018 • edited

Prior implementation

Suggested implementation

addyosmani commented Apr 29, 2018

mgechev commented May 1, 2018 • edited

addyosmani commented May 1, 2018

mgechev commented Apr 24, 2018 •

edited

mgechev commented May 1, 2018 •

edited