Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using yjs with a firestore backend #189

Closed
websiddu opened this issue Apr 4, 2020 — with Huly GitHub · 23 comments
Closed

Using yjs with a firestore backend #189

websiddu opened this issue Apr 4, 2020 — with Huly GitHub · 23 comments

Comments

Copy link

websiddu commented Apr 4, 2020

Maybe this is a wrong question, but I'm going to ask anyway.

Is there a way to use yjs with firebase/firestore as a backend? I see a lot of demos from the client slide, but it will be extremely helpful to see the server code as well. For example how is the server implemented for prosemirror?

Huly®: YJS-416

@dmonad
Copy link
Member

dmonad commented Apr 6, 2020

Hey @websiddu,
Something unique about Yjs is that it works with many backends and with many editors. y-websocket enables Yjs to share document updates using a star topology and websocket connections. y-webrtc gives Yjs the ability to connect to other peers directly over a p2p WebRTC connection. The available providers are linked in the readme.

If you use Firestore, you can use it to store Yjs document updates directly in Firestore, without a y-websocket server. But Firestore is too slow, and too expensive for a viable Yjs backend. Nevertheless, you could easily create your own Firestore adapter for Yjs using this simple API: https://github.com/yjs/yjs#Document-Updates

@dmonad dmonad closed this as completed Apr 15, 2020
@Fibs7000
Copy link

I have to correct you, if using the realtime database, the speed is more than doable (especially if using exclusively for collab editing).

The following was writing (and deleting afterwards) around 100 words on one client.

image

So it is Probably very expensive if editing collaboratively is the main focus. But if Editing on ones one and occasionally editing collaboratively, then firebase could be very well suited.

The setup I made was the following:

class FirebaseProvider {
  awareness;

  constructor(public doc: Doc) {
    this.awareness = new awarenessProtocol.Awareness(doc);
    firebase.database().ref('docA').on('child_added', (a, b) => {
      applyUpdate(doc, a.val());
    })
    
    doc.on('update', (update) => {
      firebase.database().ref('docA').push(update)
    });
  }

  disconnect() {}
  destroy() {}
}

best regards,

Fabio

@dmonad
Copy link
Member

dmonad commented Oct 13, 2020

Thanks for sharing this @Fibs7000

I recently found out that the NY Times is using Firestore sync document edits (not using Yjs though). My assumption is that collaborative editing would be too expensive as each single keystroke counts as a document update. For most companies, Firestore is too expensive in this use case. However, if you only serve a small number of users, or expect very few edits, then Firestore is a viable option.

With 10k users each producing 100k keystrokes a month, you would end up with a bill of 1,800$ just for collaborative editing. This doesn't even account for propagating awareness updates (cursor movements).

However, this is would only be about 18 cents per user which is fine one could argue.

It just seems like a waste since a single 5$ y-websocket server could handle the same workload. But this requires maintenance and doesn't solve authentication..

Firestore could be a viable option in some use-cases. It would definitely be a welcomed addition to the Yjs project. I just want to ensure that you understand the trade-offs.


Edit Feb. 2, 2022: Corrected Firestone⇒Firestore (my phone autocorrected and I didn't notice)

@Fibs7000
Copy link

Yes I absolutely understand the tradeoffs. I wouldn't use Firebase myself either... I'm currently working on a project, where we are going to build a Websocket server, to be cheaper and more scalable.

But especially for testing out and very small projects Firebase could very well be an option

@aloncarmel
Copy link

I'll drop my 2cents here, I've been using firebase realtime database as backend for my collaborative editor, using OTjs, I've scaled this, it is not expensive, I've had huge workloads of students and it ended up with 50$-80$ for gigs of data being edited. as its not 5$ obviously, it is a viable solution for those who want a worry-free zero maintenance backend and willing to shed some $$$ for it.

@aloncarmel
Copy link

I have to correct you, if using the realtime database, the speed is more than doable (especially if using exclusively for collab editing).

The following was writing (and deleting afterwards) around 100 words on one client.

image

So it is Probably very expensive if editing collaboratively is the main focus. But if Editing on ones one and occasionally editing collaboratively, then firebase could be very well suited.

The setup I made was the following:

class FirebaseProvider {
  awareness;

  constructor(public doc: Doc) {
    this.awareness = new awarenessProtocol.Awareness(doc);
    firebase.database().ref('docA').on('child_added', (a, b) => {
      applyUpdate(doc, a.val());
    })
    
    doc.on('update', (update) => {
      firebase.database().ref('docA').push(update)
    });
  }

  disconnect() {}
  destroy() {}
}

best regards,

Fabio

will you be willing to share the full code example of loading a monaco editor with firebase as backend using your adapter?

@petrbela
Copy link

I'm trying to use this with tiptap and I'm getting yjs decoder.array.subarray is not a function error. Anyone else encountered this?

@dmonad
Copy link
Member

dmonad commented Jan 21, 2022

@petrbela, please don't store the document updates using JSON encoding. These are binary buffers and not JSON objects. Using JSON encoding on updates is incredibly inefficient (overhead of 20x in size and 100x computation overhead). Store updates in firestore using base64 encoding instead https://docs.yjs.dev/api/document-updates#example-base64-encoding

@dmonad
Copy link
Member

dmonad commented Jan 21, 2022

(this question has been answered multiple times. The mentioned error only occurs when you use JSON encoding)

@aaronncfca
Copy link

aaronncfca commented Jul 29, 2022

Disclaimer: I'm new to YJs and fairly new to Firebase. But FYI (if I understand correctly): Firebase Firestore and Firebase Realtime Database are two different products for different uses.

In my case, I would like to persist data locally (y-indexeddb perhaps) and sync updates to Firestore on an interval; e.g. every 5 seconds. Ideally all updates would be combined as a single write to 1 database doc, without overwriting the doc (e.g. adding an array of updates to the doc's existing array, which Firestore supports). Is it possible to do that directly from a client? Would it be as easy as collecting updates via ydoc.on('update', ...), syncing to the database at my leisure, and then applying new updates via Y.applyUpdate(ydoc, ...) when fresh ones* come in from the database?

(* I'm not quite sure how to determine which updates are "fresh" to a given client, since firestore will only alert clients that the doc was updated, not which array entries are new.)

@dmonad
Copy link
Member

dmonad commented Jul 29, 2022

Is it possible to do that directly from a client?

Only one way to find out ;)

I'd say that it's possible. However, different approaches have different tradeoffs. I'd start by building something that works and then optimizing it.

@petrbela
Copy link

petrbela commented Jul 29, 2022

@aaronncfca Instead of figuring out which updates are new, you can probably use mergeUpdates to merge the incoming document with the local one.

Also if you're not building a real-time collaborative editor, you might just store the document directly instead of managing yjs updates.

@trafnar
Copy link

trafnar commented Sep 13, 2022

I was trying to use firebase for a quick prototype today and started wondering how best to do this.

I started by sending an update to the server on each keystroke. I guess that's ok, but it means sending data very frequently. Perhaps I can debounce this and squash several updates into one?

Then I listen for incoming changes that originated from other clients, and merge those into the document in the client's browser, checking to make sure they originated from the server and not the client itself.

This works, but there's one main problem that occurred to me: what if one update fails to reach the server? If for some reason an update isn't sent, it would then never be persisted to the server.

A solution to this would be to always send the entire document squashed into a single update, but that would mean either:

  1. overwriting whats on the server, losing updates from the other client, or
  2. storing this as a list of updates, each consisting of the entire doc up to that point, wasting too much space

I guess the right way to do this is to use state vectors to figure out the diffs and send those as updates to the server, which would (I think) catch any missing updates that happened along the way?

But that means the server needs to run YJS, which means perhaps using Firebase's cloud functions, causing much more complication to my system.

Another option I can imagine is having the server send down the entire document as one big update, which the client merges with, then sends the merged document back up, overwriting what is stored. That would only cause a problem if another client made an update during that moment, but in my low-frequency use case that would be unlikely, and I assume it would settle as long as both clients continued to make updates that don't interrupt each other like that.

The range of choices and considerations here is daunting and I'm feeling a bit lost that there is no clear direction on how to do this. Everything else with YJS has been amazingly simple to implement so I'm really hoping to figure out this piece so I can ship my app!

@trafnar
Copy link

trafnar commented Sep 16, 2022

I was able to write a cloud function that listens for writes to my realtime database and merges the incoming document update with the stored one, this works nicely.

However, it means sending the entire document as an update each time.

It would be nice to send just the incremental updates as others have done, but I worry about an update failing to reach the server and causing a gap in my history. Next I will try using state vectors to calculate the differences and only send whats needed, then if an update fails, the next update would know what it needs to send to make up for it. (as described here https://docs.yjs.dev/api/document-updates#example-sync-two-clients-by-computing-the-differences)

@trafnar
Copy link

trafnar commented Sep 16, 2022

I modified my code so that the server maintains a list of updates, and on each write to that list, generates and stores a state vector based on those updates. It gives that state vector to clients, who use it when generating updates to send to the server. This way if there is ever a "hole" (missed update) in the server data, it will be filled because we are now sending diffs, not just individual updates.

I found that if there is a hole in the list of updates, all subsequent updates aren't applied, so the off chance that an update fails to make it to the server is fairly serious.

@aaronncfca
Copy link

aaronncfca commented Sep 16, 2022

Hey @trafnar, sorry I intended to respond to your earlier comment, but it escaped me amongst overseas travels. My app isn't production-ready yet, but I did connect it to a Firestore backend for yjs synchronization. My solution to the problems you've mentioned is:

  1. A simple backend receives updates and merges them with the existing document in Firestore. I wrote this as a Cloud Function, but then ported it to run on a separate server for testing.
  2. The client app keeps both a copy of the full yjs document and a diff of any local updates in persistent local storage. New updates are merged to the diff, which is PUT to the server every few seconds (no need to get a state vector from the server, since all new updates are in the diff). The diff is cleared only after the PUT request succeeds, so failed requests can be retried later since the diff is persisted. However, I haven't finished implemented error handling logic yet or done extensive testing.

I think my solution may avoid the "hole" problem you describe in that a "failed update" is still persisted and will be merged with future updates, rather than becoming a forgotten "hole". Let me know if that is clear!

@trafnar
Copy link

trafnar commented Sep 19, 2022

Thanks for sharing that.

I've modified my system to use a cloud function endpoint where I can send a state vector and get back a diff. This is nice because it fixes any holes in the client (which are unlikely anyways) but also means I never send more data than is needed to the client, like if the client has some of the data in local indexdb storage.

My system now works like this:

On first load:

  • send local state vector to firebase and get back a diff
  • subscribe to the server's state vector (whenever it changes, client will get the new one)

On each local update:

  • Store update locally with y-indexdb
  • Each time the client generates an update, create a diff using the latest state vector received from the server and send a diff to firebase
  • A cloud function recalculates the server-side state vector and updates it in firebase

I don't actually subscribe to firebase updates although I could. Instead I added the webrtc provider so that if two clients are connected, they will get updates peer to peer. If I subscribe to firebase updates, the same thing would happen via firebase.

I'd love to know if anyone sees any issues with this strategy.

@aaronncfca
Copy link

aaronncfca commented Sep 19, 2022

@trafnar I appreciate you sharing this! From my perspective, my one thought is that updates can be generated very rapidly (per letter as the user types), so you may want to ensure you don't send the diff to Firebase more often than it can handle (once per second with Firestore, I believe) or more often than makes sense for your use case and cost/benefit.

The issue with webrtc would obviously be a use case where only one client at a time is online, which is why my current implementation subscribes to updates from Firestore. I believe yjs would allow you to do both if you chose to.

I am new at this also, so would love to hear other perspectives as well!

@trafnar
Copy link

trafnar commented Sep 19, 2022 via email

@gmcfall
Copy link

gmcfall commented Mar 16, 2023

I have recently created a Yjs Firestore Provider.

Check out yjs-firestore-provider

@dmonad
Copy link
Member

dmonad commented Mar 16, 2023

Thanks for sharing @gmcfall! Once you have Awareness working, would love to see a PR to the docs that refers to your project ;)

@deathg0d
Copy link

Here's a slightly different implementation of a yjs firestore provider: y-fire. We're currently using it for some of our projects. If you have suggestions or find any bugs, please send them our way!

dmonad added a commit that referenced this issue Feb 20, 2024
@dmonad
Copy link
Member

dmonad commented Feb 20, 2024

Thanks for sharing @deathg0d ! I added y-fire to the README! (I hope that's okay)

Also thanks for linking to yjs-firestore-provider. @gmcfall I didn't see a PR, but I'd be happy to add your provider as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants