Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: diff for Vector #181

Open
ckaran opened this issue Mar 15, 2021 · 3 comments
Open

Feature request: diff for Vector #181

ckaran opened this issue Mar 15, 2021 · 3 comments

Comments

@ckaran
Copy link

ckaran commented Mar 15, 2021

It it possible to implement a difference or edit iterator for Vector? I want to do something similar to 'rsync', but with the difference that each node in the system knows what the other node already knows. This could be done pretty easily using Vector:

  1. Create an original (empty) Vector, which will result in an empty patch set. Send that to the 'other side'.
  2. Clone the above Vector and mutate the clone. Because Vectors are immutable, the clone is almost zero cost.
  3. Mutate the clone as needed.
  4. The feature request -> Iterate over the differences between the clone and the original periodically, creating an edit set.
  5. Send the edit set to the 'other side'.
  6. Once you receive confirmation that the 'other side' has updated itself to match your clone, discard the original Vector from step 1.
  7. Rename the clone as the original.
  8. Goto step 2.

EDIT

The reason I think that im-rs is the right crate for this is because of the sharing between clones. There is no need to guess what the best match between the clones are, you already know because only those elements that have been lazily cloned could possibly be edited. You can either ship those edits to the other side directly, or further reduce the edit sets by comparing the subsequences individually to filter out changes that are actually identical (that is, if the new value in the clone at that position is exactly equal to the old value). In either case, for large sequences, Vector could speed things up tremendously.

@ckaran
Copy link
Author

ckaran commented Mar 19, 2021

I just ran across treediff. It may be of help here.

@ollpu
Copy link

ollpu commented Apr 26, 2021

This could also be very useful with maps.

In general, iterating over the data structure tree and looking for differences is not trivial, because nodes can also move, during tree rebalancing and such. I'm not too familiar with the implementations, but it might be impossible without storing more metadata for some of them.

@ckaran
Copy link
Author

ckaran commented Apr 27, 2021

@ollpu Agreed! Honestly, it would be great if there a trait that enabled this that all of the collections implemented. E.g., something like:

pub trait Changes {
    /// Returns an iterator of changes between this version and the prior version.
    ///
    /// Returns an object that is able to iterate over the changes between this
    /// version and the immediate predecessor version.  The iterator object 
    /// **must** be valid even if the predecessor version or this version of the
    /// object are dropped.
    fn change_iterator(&self) -> Result<ChangeIterator, Error>;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants