Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using diamond-types with an external data structure #15

Open
archseer opened this issue Mar 17, 2022 · 12 comments
Open

Using diamond-types with an external data structure #15

archseer opened this issue Mar 17, 2022 · 12 comments

Comments

@archseer
Copy link

I've been meaning to experiment with diamond-types in Helix which already uses Ropey + an OT Transaction to apply changes.

Would it be possible to use diamond-types by feeding it edits then extracting a list of operations to replay on the rope? (I think Atom's teletype-crdt worked in a similar way.) That way we can only use CRDTs when a collaboration session is started and just use the plain rope when editing locally. I saw you had a module that supported this prior to the rewrite, but it's gone now.

@josephg
Copy link
Owner

josephg commented Mar 17, 2022

Yeah sure can. Internally operations are stored in their "original" position - ie, at the cursor position when the user created the change. But this isn't very useful, because you can't just replay those changes into a rope or you'll end up with weird broken stuff. (Eg, if two users concurrently delete the same thing, you'll get a broken state).

Generally you need those operations "transformed".

I've had code to do that for awhile so I could make a rope at some version (Branch::merge). I refactored it into an iterator the other week while I was working on my wiki. (I needed the sequence of operations in order to transform cursor positions).

The function you're looking for is OpLog::get_xf_operations(&self, from_version, to_version), or the simpler OpLog::get_all_xf_operations(&self). It returns an iterator over (TimeSpan, Option<Operation>). (Thats more complicated than it should be - for now call .filter_map(|(_v, op)| op), which would just give you an iterator over Operation).

Internally it runs the merging algorithm to give you a sequence of "transformed operations" that you can just blast into a rope, or do whatever you want with. Operation looks like this:

pub struct Operation {
    pub tag: InsDelTag, // enum InsDelTag { Ins, Del }
    pub span: TimeSpanRev {
         // Where the change happened
         // span.start is the insert position, span.start..span.end is the deleted range)
         pub span: TimeSpan,
         pub fwd: bool, // Don't worry about this for now.
    },

    // The inserted content. Will always be set for inserts unless you discard inserted content when saving.
    pub content: Option<SmartString>,
}

I only added it the other week and I haven't documented it.

Actually I haven't documented most of the DT API - I want to tidy up a lot of things at some point before I publish it and promote it more. I think its ready for that work though. The code is good and it works. There's always more features to add but the core is solid.

If you're goofing around from cargo I'm happy to publish another version. I think it might be time to start getting some documentation polish and use.

@josephg
Copy link
Owner

josephg commented Mar 17, 2022

And if you want to see an example of all this, I made a simple web based wiki on top of diamond types which is live (at the moment) at wiki.seph.codes/whatever. The code for the wiki is here if you want to see how it works. The frontend itself is in javascript though - it'd be way cooler to have something working in helix!

And let me know if you want a hand hooking that up. I'd be happy to jump on a zoom call and talk it through. Being able to use DT to edit a document in helix would be delightful! (My email is in my commits)

@archseer
Copy link
Author

Thanks for the writeup!

Nice, seems like it'll be fairly easy to translate the Operations into Changes then construct a transaction.

I'm a bit fuzzy on how to actually synchronize these changes over a network though. I see the wiki is using Braid for the protocol layer, @brynbellomy has been telling me about it for a while so I'll see if I can build a Braid implementation on top of rust-libp2p. I'm still new to CRDTs though so I got some weekend reading/watching to do 😄

@josephg
Copy link
Owner

josephg commented Mar 18, 2022

Oh nice! Yeah those changes should be pretty compatible. Are you using byte offsets or counting characters for the position? DT counts unicode characters, so fingers crossed!

Yeah essentially, if A is syncing with B, A keeps track of B's version of the document. Whenever A has new changes, A encodes them and sends them over the network to B. (And vice versa).

When DT merges patches, it'll skip any operations it already has. So you don't need to worry about debouncing or anything like that. The laziest collaborative editor could work by just repeatedly sending the whole document history to the other peer, and calling merge_bytes(). Performance would be awful but it'd work.

Versions are a little more interesting, because there's two kinds of versions:

  • Local versions (&[usize] with 1+ items) are easier & faster to work with. The local set of changes is internally sort of stored in a list. A local version is just the index of the last known change in that list. Well, but when concurrent changes happen, it names both / all changes that have been merged together. (Its like parents in a git commit.) But the operations don't have a globally defined order, so local versions only make sense locally. They can even change when you save then re-load the same document from disk.
  • Remote versions - which are the same list of items, but instead of an index you store a list of full IDs - eg [{agent: 'xyz', seq: 123}, ...]. You can convert back and forth with oplog.frontier_to_remote_ids and remote_ids_to_frontier.

I need to rename those methods - I used to call local versions 'frontiers' and I haven't gotten around to purging all references to the old name.

If you want to see all this in action, hidden in the bottom right of the wiki is a 'info' button which will pop out a box showing you the local and 'server' versions while you make changes. You can try going offline and making changes and seeing what happens (and what happens when you reconnect).

You create a (binary) patch with oplog.encode_from(ENCODE_PATCH, &from_version) -> Vec<u8> and merge it with oplog.merge_bytes(), which returns the new remote peer's version.


There's one thing I haven't done a good job of yet - and that is, if two peers both have local changes the other doesn't know about, and they reconnect, then right now its awkward & awful to figure that out. The lazy / awful workaround for now is to just send your entire local history when you connect. Which isn't as bad as it sounds, but I want a better answer to this at some point. I know how I want it to work; I just haven't written the code yet.

@archseer
Copy link
Author

I've been having problems importing latest master:

diamond-types = { git = "https://github.com/josephg/diamond-types" }

Fails with a bunch of build problems under old_experiments. It's supposed to fetch just that specific crate in the workspace but for some reason it discovers the diamond-types crate under old_experiments and attempts to use that. (I'd argue that's actually a cargo bug because it's not following the root Cargo.toml)

@josephg
Copy link
Owner

josephg commented Mar 21, 2022

Oh thats annoying. In the meantime you should be able to do a local checkout of diamond types and explicitly use path: ../diamond-types/crates/diamond-types. I'll update the version in cargo soon.

@archseer
Copy link
Author

Could you rename the old directory? That way it would work as a git dependency without requiring a release

@josephg
Copy link
Owner

josephg commented Mar 25, 2022

Uuh, rename it to what? old_experiments is already an arbitrary directory name. I’m confused why cargo is even looking for files in there. What would help?

@josephg
Copy link
Owner

josephg commented Mar 25, 2022

Ooohhh because there’s a crate in there called diamond-types crate it’s trying to build? Happy to give that a try.

@archseer
Copy link
Author

archseer commented Mar 25, 2022

I think you need to rename the crates themselves to diamond-types-old. It ignores the workspace root Cargo.toml and crawls all the subfolders of a git dependency to find a crate that matches: rust-lang/cargo#1462 (comment)

@josephg
Copy link
Owner

josephg commented Mar 25, 2022

Ok give that a try! You might need to clear cargo's cache / the pinned commit in Cargo.lock

@josephg
Copy link
Owner

josephg commented Mar 25, 2022

Also FYI I went through and did a whole lot of spring cleaning of the public API a couple of days ago, because you're looking to use it. I renamed a whole heap of methods, and added documentation for a bunch of public functions. I figure cleaning up the API now is better than cleaning it up later. Eg, operation.span is now operation.loc because that field describes where the change happened, not when. And I renamed TimeSpan to DTRange because its just a range and I use it like ranges everywhere internally. (I would just use std::Range but that doesn't implement Copy.)

All the same code is still there. Let me know if there's anything thats not clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants