Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing state vector should be usable for update encoding #641

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Horusiath
Copy link

@Horusiath Horusiath commented May 2, 2024

The following test confirms an edge case when missing state vector is not correctly computed.

When we get an out-of-order update for a client, that has never been seen before, a computed missing state vector starts from that update start clock instead of 0.

Second commit contains proposed solution.

Huly®: YJS-816

@dmonad
Copy link
Member

dmonad commented May 2, 2024

I think the current implementation is preferable. The additional case is not needed.

When we know that operation o1 depends on the missing o2(clock:3, client: 2), then we populate missingSV.set(2, 3). Once we receive o2, we might figure out that we also need o3(clock:2, client: 2).

We don't need any more information to determine that something is missing.

The reason why I think the current approach is preferable is as follows: We should not try to apply o2 with missingSV( {2: 3} ) once we receive o4(client: 2, clock 0), because it still does not resolve the missing dependency.

It is not required that missingSV is "complete" (i.e. it contains all operations that the update might require). This would also require much more computation. But it should be sound (contain only operations that are required)

I think I get what you are trying to achieve. If you notice that there are missing operations, then you want to send the missingSV instead of recalculating the full state vector. There are other edge cases when this might not work in the current codebase of Yjs. One example: we only compute the first "missing operation" for each operation (first we report origin, then we might report rightOrigin).

Furthermore, you still might have other lost operations that are no dependencies of other operations. Loosing updates in a client-server model with a reliable protocol (TCP/WS) is a REALLY bad thing. That means that somewhere in the codebase, updates are not propagated correctly. You can't detect all kinds of dataloss with missingSV - this only captures a small number of edge cases.

A quickfix would be to do a full resync in regular intervals. This would at least capture all kinds of dataloss. However, it might make sense to reevaluate the codebase itself. Often, usercode throws exceptions, which prevents the propagation of otherwise well-formed updates. It the case of an exception, the connection should be closed.

@Horusiath
Copy link
Author

Yeah, I think I've galloped too far. With this PR approach missing state vector would be actually equivalent to document's own state vector. Question would be: what things existing state vector could be used for?

@dmonad
Copy link
Member

dmonad commented May 3, 2024

The use-case of missingSV is only "if any of the ops in missingSV is applied, then we can retry applying the associated update".

There is probably little overlap with the state vector which describes "what we have applied to our own document".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants