missing state vector should be usable for update encoding #641

Horusiath · 2024-05-02T06:12:40Z

The following test confirms an edge case when missing state vector is not correctly computed.

When we get an out-of-order update for a client, that has never been seen before, a computed missing state vector starts from that update start clock instead of 0.

Second commit contains proposed solution.

_{Huly®: YJS-816}

dmonad · 2024-05-02T12:12:19Z

I think the current implementation is preferable. The additional case is not needed.

When we know that operation o1 depends on the missing o2(clock:3, client: 2), then we populate missingSV.set(2, 3). Once we receive o2, we might figure out that we also need o3(clock:2, client: 2).

We don't need any more information to determine that something is missing.

The reason why I think the current approach is preferable is as follows: We should not try to apply o2 with missingSV( {2: 3} ) once we receive o4(client: 2, clock 0), because it still does not resolve the missing dependency.

It is not required that missingSV is "complete" (i.e. it contains all operations that the update might require). This would also require much more computation. But it should be sound (contain only operations that are required)

I think I get what you are trying to achieve. If you notice that there are missing operations, then you want to send the missingSV instead of recalculating the full state vector. There are other edge cases when this might not work in the current codebase of Yjs. One example: we only compute the first "missing operation" for each operation (first we report origin, then we might report rightOrigin).

Furthermore, you still might have other lost operations that are no dependencies of other operations. Loosing updates in a client-server model with a reliable protocol (TCP/WS) is a REALLY bad thing. That means that somewhere in the codebase, updates are not propagated correctly. You can't detect all kinds of dataloss with missingSV - this only captures a small number of edge cases.

A quickfix would be to do a full resync in regular intervals. This would at least capture all kinds of dataloss. However, it might make sense to reevaluate the codebase itself. Often, usercode throws exceptions, which prevents the propagation of otherwise well-formed updates. It the case of an exception, the connection should be closed.

Horusiath · 2024-05-03T02:53:37Z

Yeah, I think I've galloped too far. With this PR approach missing state vector would be actually equivalent to document's own state vector. Question would be: what things existing state vector could be used for?

dmonad · 2024-05-03T09:02:15Z

The use-case of missingSV is only "if any of the ops in missingSV is applied, then we can retry applying the associated update".

There is probably little overlap with the state vector which describes "what we have applied to our own document".

Horusiath added 2 commits May 2, 2024 08:10

missing state vector should be usable for update encoding

443335f

fix: initialize missing state vector with 0 for new client

27299f7

Horusiath mentioned this pull request May 2, 2024

fix: missing state vector for new client id y-crdt/y-crdt#423

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing state vector should be usable for update encoding #641

missing state vector should be usable for update encoding #641

Horusiath commented May 2, 2024 •

edited by dmonad

dmonad commented May 2, 2024

Horusiath commented May 3, 2024

dmonad commented May 3, 2024

missing state vector should be usable for update encoding #641

Are you sure you want to change the base?

missing state vector should be usable for update encoding #641

Conversation

Horusiath commented May 2, 2024 • edited by dmonad

dmonad commented May 2, 2024

Horusiath commented May 3, 2024

dmonad commented May 3, 2024

Horusiath commented May 2, 2024 •

edited by dmonad