Blog Post: DHT #244

pstan26 · 2017-03-16T19:12:58Z

Sounds like @jcnelson is 60% done -- go Jude!

Once we have it completed, let's plan to:

get public reviews from P2P experts (Frans etc)
Request comment from Mike Freedman, potentially Petar, and others

Goal here is to create one post to clarify fundamental limitations of DHTs. We need to get reviews and request some lines be made public. There are many decentralized storage systems looking for a solution -- the fundamental DHT issues apply to any project using them.

Maybe we can add bullets on the post here or link to the blog post, @jcnelson ? Is it reasonable to aim for April 10th release?

jcnelson · 2017-03-16T20:02:23Z

Here's what I have so far:

One of the more interesting innovations in Blockstack is in
how it distributes the set of zone files to all peers.
Like several other decentralized Web infrastructure projects, the original system used a DHT
(distributed hash table). This was acceptable for building a prototype implementation while the network
was small and mostly-reliable. However, this would have been unacceptable in the long-term due to the
stability and security problems inherent to DHTs. Since November 2016, we have been distributing zone
files using a decentralized storage system built from first principles called the Atlas Network.

DHTs: Unsafe at any Speed

(This was previously discussed here,
and in the middle of this blog post).

DHTs came into being in the 1990's and 2000's, and were among the first
scalable decentralized data stores. The most widely-used DHTs power file-sharing
software like BitTorrent and eMule. These are based on the Kademlia
DHT, and enjoy widespread usage today.

Easy to Misuse

Since the mid 2000's, the distributed systems research community has learned several
hard but important lessons about the usability of DHTs. In particular, DHTs are
only useful if all of the following are true for your application:

You can't store a 100% replica. Your application stores so much data that
it is unreasonable to try to keep a full copy on any single node.
You will tolerate arbitrarily slow I/O. Due to the way DHTs handle
requests, reads and writes have very variable latency. Some incrmenetal
work on DHTs (like
DSHTs) try to
fix this for frequently-requested data, but the long-tail performance is still
terrible. This is inherent to DHT routing protocols; a pathological
request can needlessly bounce through several high-latency network links
before being handled.
You will periodically re-announce data. Public-access DHTs allow
anyone to write to them. To deal with so much data, they simply delete stale
data to relieve memory pressure. At the same time, nodes can come and go as
they please, which can cause key/value pairs to get dropped. To deal with this,
applications need to re-replicate data every so often to keep it available.
You will never change a key/value pair. DHTs can split into one or more disjoint
networks due to partitions, and re-join later on. While split, a client can
write conflicting key/value pairs to nodes on either side of the partition.
This leads to externally-visible inconsistency--some clients will see one value
for the key, and other clients will see a different value. To avoid this, the
application must ensure that there is at most one value per key (i.e. data is
immutable once written).

These are difficult demands to meet. Each application process effectively needs
to be online 24/7 to ensure its data is available, and even then the developer needs to
adress data consistency and I/O performance via some other way.

The reason BitTorrent and eMule can get
away with using DHTs is because (1) they use them only for "soft state," and (2)
they do not rely on them for correctness. Specifically, they use a DHT to
store tracking and routing information for a particular file.
This information is small by comparison to the
file, and can be re-created quickly by any peer. There are no consistency
concerns since since the .torrent file contains all the chunk hashes, and
there are no negative consequences of
receiving invalid tracking data since the hashes are known to all peers
beforehand. Peers also do not share data via the DHT, but connect to one
another directly.

Hard to Secure

The difficulties do not end there, though. DHTs are still vulnerable to three
types of attacks that can harm even the most resilient DHT-powered application.

Endless Data DDoS. Without some rate-limiting or access-control mechanism, a DHT has no way
to limit the amount of data inserted. An adversary can flood the DHT with lots
of garbage data and knock nodes offline. (This never applied to Blockstack,
since our DHT only accepted key/value pairs whose hashes were written to the
Bitcoin blockchain).
Hash-Bucket Takeover. By joining with the right nodes in a DHT, a Sybil
node can take over responsibility for specific hash buckets in the network. In
doing so, the Sybil will both be able to see who's asking for the data (a
privacy concern), and will be able to censor the data by ignoring or slowing
down requests (an availability concern). A Sybil node can also do extensive
damage to a running DHT by repeatedly joining it, taking over heavily-requested
buckets, and then dropping them.
Route-table Takeover. Suppose that the DHT let the developer "pin"
hash buckets to honest nodes to thwart hash-bucket
takeovers. This does not fix the availability problem, since a Sybil can insert itself
as the immediate neighbors of the honest node and prevent the rest of the
system from discovering it. This works because DHT nodes do not have a global
view of which nodes have which buckets. Instead, each node keeps a short list of
which nodes host buckets that are "close to" its buckets in the key space.
If the Sybil node can become an honest node's neighbors, then it can prevent
other nodes from discovering the honest node's data.

muneeb-ali · 2017-03-16T20:13:07Z

Thanks for posting Jude! I have a bunch of (private) notes on this and can take a crack at expanding the draft.

jackzampolin · 2018-01-10T22:16:40Z

Doesn't look like this made it into a blog post. @jcnelson @pstan26 Would this still be useful?

pstan26 added the Content File label Mar 16, 2017

pstan26 assigned jcnelson, muneeb-ali and pstan26 Mar 16, 2017

jackzampolin added area/content and removed Content File labels Jan 10, 2018

will-corcoran removed the area/content label Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog Post: DHT #244

Blog Post: DHT #244

pstan26 commented Mar 16, 2017 •

edited

jcnelson commented Mar 16, 2017 •

edited

muneeb-ali commented Mar 16, 2017

jackzampolin commented Jan 10, 2018

Blog Post: DHT #244

Blog Post: DHT #244

Comments

pstan26 commented Mar 16, 2017 • edited

jcnelson commented Mar 16, 2017 • edited

DHTs: Unsafe at any Speed

Easy to Misuse

Hard to Secure

muneeb-ali commented Mar 16, 2017

jackzampolin commented Jan 10, 2018

pstan26 commented Mar 16, 2017 •

edited

jcnelson commented Mar 16, 2017 •

edited