Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hmr): implement better module node equality checks #12125

Closed

Conversation

benatshippabo
Copy link
Contributor

@benatshippabo benatshippabo commented Feb 20, 2023

Description

Problem 1: using set.has(moduleNode)

Currently, we use a Set to try to deduplicate objects in hmr.ts. This will not work if the boundary is not referencing the same object..

Map<ModuleNode, Set<ModuleNode>> will not work for the same reason

So we create a HashSet class which takes a hash function and replace usages of Set with HashSet. The hash function we plan to use is as follows:

const getModuleNodeHash = (moduleNode) => moduleNode.id ?? moduleNode.url
const getHash = ({ boundary, acceptedVia }) => 
  `${getModuleNodeHash(boundary)}:${getModuleNodeHash(acceptedVia)}`

Note
I am not sure when the boundary is assigned the id. Please let me know of a better hash function

Problem 2: using moduleNodes.includes(moduleNode)

I also noticed we use moduleNodeArray.includes(moduleNode), this will cause bugs if moduleNode is not referencing the module node already within the array. So I fixed it by using .some()

Additional context

fixes #12062


What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

Before submitting the PR, please make sure you do the following

  • Read the Contributing Guidelines.
  • Read the Pull Request Guidelines and follow the PR Title Convention.
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. fixes #123).
  • Ideally, include relevant tests that fail without this PR but pass with it.

Comment on lines +33 to +47
it('should update existing values', () => {
const set = new HashSet<{
id: string
value: string
}>({
getHash: (value) => value.id,
})

set.add({ id: '1', value: 'a' })
set.add({ id: '1', value: 'b' })

expect(set.size).toBe(1)
expect(set.has({ id: '1', value: 'a' })).toBe(true)
expect(set.get({ id: '1', value: 'b' })?.value).toBe('b')
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: not sure if this is the behavior we want, let me know if we don't want to update entries with the same id

@benatshippabo benatshippabo changed the title fix(hmr): use hash set instead of native set fix(hmr): implement better module node equality checks Feb 24, 2023
Comment on lines +55 to +58
export const isModuleNodeEqual =
(a: ModuleNode) =>
(b: ModuleNode): boolean =>
getModuleNodeHash(a) === getModuleNodeHash(b)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: there is probably a better way to do this

Copy link
Member

@ArnaudBarre ArnaudBarre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this fix.

I think expecting reference equality for module in the graph is totally fine.

The set for boundaries is effectively is bad pattern. But it's only used as an append only data structure, so an array would be enough.

Do you have an clue where the reference equality fails and triggers an infinite recursion?

@benatshippabo
Copy link
Contributor Author

benatshippabo commented Mar 13, 2023

I'm not sure about this fix.

I think expecting reference equality for module in the graph is totally fine.

The set for boundaries is effectively is bad pattern. But it's only used as an append only data structure, so an array would be enough.

Do you have an clue where the reference equality fails and triggers an infinite recursion?

@ArnaudBarre we inline the boundary when adding it to the set here so the reference check will always fail

boundaries.add({
boundary: node,
acceptedVia: node,
})

For background, this PR was created because the Set hit the maximum size in this issue #12062

Maybe this isn't the root cause though


edit: after rereading your question again, if you are asking about problem 2 and this unit test

it('should return early if there is a circular update', () => {
const node = new ModuleNode('/x.js')
node.importers.add(node)
const boundaries = new HashSet<{
boundary: ModuleNode
acceptedVia: ModuleNode
}>({
getHash: ({ boundary, acceptedVia }) =>
`${getModuleNodeHash(boundary)}:${getModuleNodeHash(acceptedVia)}`,
})
const previousChain = [new ModuleNode('/x.js')]
expect(propagateUpdate(node, boundaries, previousChain)).toBe(true)
})

I have not been able to reproduce the cycle and am totally fine with removing those changes from this pr

@mattrunyon
Copy link
Contributor

mattrunyon commented Mar 13, 2023

I'm not sure about this fix.

I think expecting reference equality for module in the graph is totally fine.

The set for boundaries is effectively is bad pattern. But it's only used as an append only data structure, so an array would be enough.

Do you have an clue where the reference equality fails and triggers an infinite recursion?

An array here causes terrible HMR performance in some cases. The Set gets mapped to an array in updateModules, so it should be deduplicated properly to prevent reloading the same module multiple times.

For reference, I get HMR triggers that include only 14 files, but triggers 20k+ updates of those same 14 files (between 1 and 5k per file). HMR takes 30-60s w/ my browser basically frozen. This renders HMR basically unusable. I am using aliases to other modules in a monorepo, and when I remove those aliases and just reference the built files, I get 67k updates for changing the same line in the built version.

I added some quick and dirty de-dupe to the Set after the propagateUpdate call and with the actual 14 files, HMR is exactly what I would expect.

I'm not sure why a Map<Node, Set<Node>> wouldn't work though. The TS playground link in the OP is not a 1:1 comparison. You could instead use the node or importer as the keys/set values and those should be stable. Then recreate the Set on the way out or just update updateModules to handle the new data structure since it doesn't look like propagateUpdate is exported or used anywhere else.

@benatshippabo
Copy link
Contributor Author

I'm not sure why a Map<Node, Set<Node>> wouldn't work though. The TS playground link in the OP is not a 1:1 comparison. You could instead use the node or importer as the keys/set values and those should be stable. Then recreate the Set on the way out or just update updateModules to handle the new data structure since it doesn't look like propagateUpdate is exported or used anywhere else.

If the nodes being passed in truly are stable, sure the Map solution would work. Although I do feel that using a hash set is more elegant since it has the same api as Set and makes no assumptions on the parameters

@ArnaudBarre
Copy link
Member

ArnaudBarre commented Mar 14, 2023

I get HMR triggers that include only 14 files, but triggers 20k+ updates of those same 14 files

There is something wrong in the boundaries propagation so. Even with a set for data structure, it stills means the propagate update is doing cycles without detecting circular references or something like this. We should try to understand what happens there and fix this, dedupe is not the root fix.

As I said, the current use of Set is misleading, but we aren't doing any reference check with it, so we should simplify the data structure should just be an array (if we keep the current algorithm).

First what is important is to understand what fails in the current algorithm, and then we can discuss how to solve the reference equality or avoid the mismatch of it.

@benatshippabo
Copy link
Contributor Author

First what is important is to understand what fails in the current algorithm, and then we can discuss how to solve the reference equality of avoid the mismatch of it.

Makes sense to me

@patak-dev
Copy link
Member

@benatshippabo I think we should close this PR, no? As it was discussed, we should check why there are duplicated modules in the first place. The solution will probably not involve a HashSet. We could keep discussing in #12062, and this PR with the discussion until here is already linked to it as a reference.

@mattrunyon
Copy link
Contributor

mattrunyon commented Mar 29, 2023

If this is closed, can #12062 at least be bumped up in priority? It is a horrible performance problem in normal HMR situations on a large project, not just when checking out a branch while Vite is running

@benatshippabo
Copy link
Contributor Author

Not the solution we are looking for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vite crash when switching between two significantly different branches - RangeError: Set maximum size exceeded
4 participants