-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: a radically simpler Peerstore #2355
Comments
The Peerstore(or AddrStore or whatever name this component ends up with) is still very useful for application usage, so please keep it. Furthermore, the datastore-backed implementation is helpful for networks wishing to bootstrap from non-bootstrapper peers, so please keep it as well. Such applications would store peers in the component provided by go-libp2p instead of reinventing the wheel. If libp2p does not provide any means to persist addresses and relies only on the centralized set of bootstrappers, is it really p2p? These two become even more helpful together with application-defined metrics. Currently, only GossipSub does extensive scoring in the libp2p world, but GossipSub's peer quality metric is also meaningful for other protocols. Usually, if GossipSub thinks a peer is behaving poorly, the chance it behaves poorly in other protocols is high. Potentially, multiple protocols would talk to some sort of libp2p provide PeerScorer/Tracker to maintain a subjective image of the peer. |
Thanks for your feedback @Wondertan.
Unless the application is only interested in storing data for the lifetime of the underlying connection, it's not really useful, is it? And even that use case can trivially be replicated with a map and ~10 lines of code for handling the event bus notifications. For most applications, a more sophisticated cleanup logic will be required. Fundamentally, this is a question about the correct layering. Why would libp2p expose a general interface for peer-related information? Just because we define the
libp2p doesn't rely on bootstrappers, Kubo does (did). With the v0.21 release, they're persisting peers and not relying on the bootstrappers for subsequent starts, without using the database-backed peerstore. It really doesn't make a lot of sense to store information that you regularly need in a database, just so you can persist that information on shutdown. That data should live in memory, and be persisted on disk on shutdown (or in regular intervals). The new peerstore could expose a method to export a snapshot of its current state
That's not possible with today's peerstore either. I agree that this could potentially be useful, but will require some careful abstractions. Not every implementation will / should be familiar with the intricacies of GossipSub's scoring function. I doubt that the right approach is to extend the existing |
Thanks for writing this up. I agree with all points. I'm thinking about how to ease the transition for users who are relying on this. Maybe we could expose a new package that lets user protocols still use the existing Peerstore interface ( |
I agree with all the points. How should we handle removing PeerMetadata? |
@Wondertan what application specific usages do you have in mind other than I find the current Peerstore API very confusing. One reason that applications cannot use it correctly is that for all TTLs except permanentTTL, we'll remove peers 30 mins after disconnect. This leads to hacks like this https://github.com/libp2p/go-libp2p-kad-dht/blob/master/dht.go#L549 |
Discussed this with @Stebalien. It's probably useful to have access to the protocols supported on a connection. It makes sense to store this information with the connection, not in the peerstore. That would allow us to (explicitly) support a different set of protocols on relayed connections. |
I just want to add a point relating to the "incomplete" persistence for this, and the way the in-memory store being written to already from the base libp2p.Host: The store keeps a key under the key suffix It's not good key storage policy, even in the memory store, if one's code is already storing the key material in a more secure way, and unaware libp2p.Host is doing this, making all that meaningless, this is memory accessible to any and all other processes owned by the same user. It also appears to be encoded in protobuf, for an in-memory record, that is decoded everywhere it is used, anyway. Why not use Gob for this? For in memory, there's no migration issue, and for on disk, if it's implemented with Badger, then all clients have Gob available, and are even using it for some of the records. For now, I'm just working directly with the badger back end, using its built-in encryption capability, and using its iterators to walk the records. I have already forked the I will keep following these related issues, as of course if applicable I would be happy to have my code be used or adapted to finish this task properly. |
Just wanted to illustrate the point about how this code is incomplete and needs work, this is what I see right at the top of the kv store iteration:
That is the private key of the libp2p host, which if used with the current pstoreds badger back end means writing that key in cleartext in the data store, which means it's able to be ripped from 3 different places, on the disk in the persistent store, in memory in the badger/pstoremem instance, and in the libp2p.Host. |
@l0k18 Did you intend to post your private key publicly here? Now it can be ripped from 4 different places 🙃 |
I think storing secrets in memory is fine. Many other applications do this (SSH, LUKS, ZFS). I agree it's even better if you can offload the secret to a TPM.
This isn't true. At least on Linux, processes can't read other processes memory unless they opt into it as an IPC mechanism. I don't think it's obvious that using the pstoreds will result in your host key being written to disk. Maybe another reason to remove this from the public api. |
This is the main use case which was driving us towards using a db-backed peerstore. Without something writing to disk we need to bootstrap every time a node restarts. Db-backed would also allow us to store arbitrary data for a peer in the metadata--e.g. if we wanted to serialize gossipsub score for persistence across reboots as well. Is the future approach intended to move all persistence into each individual application, or just focus it in the discovery mechanism? |
Why's that? There's plenty of hooks to learn about connecting / disconnecting peers. It would be easy (and way more performant!) to hook into the eventbus and periodically save some of the connected peers to disk.
Indeed it is. Storage considerations will vary greatly between applications, and it's not realistic to expect that libp2p will be able to provide a one-size-fits-all datastore. |
I think at the moment it's because it works out of the box. We also haven't gone through perf testing yet to see what kinds of io patterns the disk-backed peerstore creates. Ideally in a network with low churn you wouldn't be updating the peerstore that often and so wouldn't have that much io. |
I always write my cryptography tests to use freshly generated random values. I have a scheme that generates them fast using scalar addition also, but I haven't put it into the tests.
Badger is not much different in terms of IO usage compared to LevelDB or RocksDB but its split value/key tables mean a lot less writing when only keys are accessed, and makes it useful to put some small parts of the values in the keys. If the key table would be small, or the app is better with lower key search latency it can be memory mapped and iterators stay purely in memory. For on disk, of course after a lot of edits the GC needs to be done to keep the iterations fast. I can't see any reason why migration needs to be supported though. Non-Golang KV stores are lagging behind what Badger can do. There is another Go-only KV store called 'pogreb' which is really good for data that is mostly append only also. |
Related issue: #2754 We used to be saving signed peer records in the certified address book, but removed that code because it didn't filter the addresses properly, and the certified address book was not used anywhere else in go-libp2p. The issue of course, is that another protocol was relying on identify's (undocumented?) behavior here. Having the existing peerstore be a core component of the host leads to these tightly coupled behaviors. I'm not sure identify needs to be so tightly coupled with the peer store. Another solution would be that Identify has a contract of events it emits and events it subscribes to. During the course of an identify it emits events about what it has learned and other components/services/protocols can subscribe to those events. Users could have a custom PeerStore better suited for their application that subscribes to those events. Or user protocols can consume those events directly. |
The protocol listens to be stored somewhere even if it's not stored in a peer store. E.g.
- Services, when first starting up, often want to find all peers that speak a specific protocol.
- Multistream definitely needs to know a peer's protocols so it can select the correct protocol without taking a round trip to negotiate anything.
That's why I suggested associating protocols with connections.
|
Observation: The current
Peerstore
does a lot of things, but it doesn’t any of them well.Problems:
CertifiedAddressBook
).SupportsProtocol
doesn’t distinguish between Identify having completed and not, leading to complexity in applications (example).For example, this shows up:
This is a proposal to radically shrink the peerstore, such that it offers only what is needed to dial connections. Applications running on top of QUIC, and even transports like QUIC that want to use 0-RTT, will need to come up with their own data structures to store peer-related data. This is inevitable, since different applications will have different needs regarding persisting data when a peer disconnects, and different heuristics for limiting the size of these data structures.
In particular, we’ll get rid of the following components:
PeerMetadata
: Only used to save the agent / protocol version sent in Identify. There’s no real reason to save that information in the first place.Metrics
: Only used to store RTT statistics. Arguably, that’s not a per-peer, but a per-connection (or rather, per path, if dealing with multipath transports) property. We should consider exposing the RTT on theConnection
interface.ProtoBook
: Used to store the protocols supported by this peer (as learned via Identify). Takes up huge amounts of storage, for little benefit. Protocols that need a list of peers supporting their protocol can easily subscribe to the respectiveEventBus
notification. 1The
KeyBook
will be radically simplified to only allow storing of the public key.As a result, this leaves us with a peer store that only stores:
It basically becomes an address book, and maybe we should honor that fact by renaming it accordingly.
The following changes will make it a lot more smarter and more useful:
We could consider removing the
Peerstore
interface. This is (mostly) a libp2p-internal component. In particular, it does not make sense to implement a datastore-backed version of it (it might however be interesting to be able to serialize the current state, so it can be restored easily).Footnotes
Note that this breaks an optimization for
Host.NewStream
, which currently picks the first supported protocol from a list of protocols. This is fine, since 1. worst-case, this results in an extra round-trip, 2. it is a rare use-case and 3. the application can track which protocols a peer speaks and then open a stream with just that protocol, speeding up things by 1 RTT compared to the status quo. ↩The text was updated successfully, but these errors were encountered: