-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed DHT Based Routing #10
Comments
Specifically, I'm imagining a typical sharded KV architecture where the keyspace is broken up into shards and then each shard has some number of replica nodes. |
My proposal was to build something like this based on Koorde, where rules of joining as full DHT node would be strict, and thanks to Koorde's structure it would be possible to, for example, cleanly hand over records when some node has to go offline. Information about Koorde: https://github.com/libp2p/research-dht/issues/9 |
Out of curiosity. Putting aside the permissioning via consensus element, what's the difference between this and something like a KV store with sharding implemented via consistent hashing, à la Cassandra but with a KV model? Why would one pick a closed DHT versus a distributed database? |
One of the primary reasons I would raise is that we know that DHT can scale to 100s or 1000s of nodes. This isn't as clear in the case of distributed databases. Also, I think that it would be easier for a number of unrelated parties to run DHT nodes in a permission DHT network than a single distributed database cluster. |
That makes sense. We do not require a private network between these nodes, right? Then I'd add that the solution needs had to be fit for adversarial environments. Also, self-healing upon departures, and collusion resistant if replicas are decided deterministically and are known to everybody. |
That's basically, what I'm suggesting. (I'm using DHT literally to mean distributed hash table, not implying any kind of p2p-ness). |
TLDR: Relying on specific centralized parties in the short term is ok, but nothing closed please 🙏🙏🙏 While I can be onboard with voluntarily run supernodes (e.g. relays, rendezvous, etc.) that help us move from a centralized world towards a distributed one, a closed network feels like a step in the wrong direction. As simple issue that comes up is that if the system is "permissioned" then some organization is granting the permissions. That organization is now potentially subject to the whims of political and legal pressures to censure certain data from their network. Additionally, there are other issues such as there being less attention and focus on the open networks. @jimpick told me that apparently this occurred with Dat where their DHT was broken for a while, but nobody really noticed because they were defaulting to their centralized DNS solution. I'm fairly certain there are other tradeoffs we can make aside from closing/permissioning the system that will get us similar results. Some straw men suggestions include:
Overall, while it's nice that a permissioned system would help alleviate classic DHT issues like poisoning and Sybil attacks it feels like just kicking the can on the entire premise of a distributed/p2p network. I would bet that in our current environment people are more likely to be mooches of free network resources (like the DHT) then trying to break everything (Sybil attacks/intentional poisoning to evict DHT records). If so, then if we just give people the ability to opt-in to donating large amounts of resources to the network we'd probably end up with a smaller faster DHT that is still open. |
@lanzafame this sounds close to our discussions. We thought that having a general DHT for Everything is unoptimal, and that the general DHT should just be used for service-specific DHT discovery. And that seems to be pretty much what is proposed here ipfs/notes#291 (comment) . |
Ok, I missed an important point here: I'm not suggesting we replace our DHT with a closed DHT, I'm suggesting we add a secondary fast-path DHT.
That's basically what this is, except that you can't just show up and claim to be reliable. We need these nodes to actually be reliable, that's why this is "closed". However, that doesn't mean we have to have a single organization deciding who can participate (we can have the members of the DHT decide who can join).
We already put 20 copies. DHT nodes should probably rebalance but that can get really expensive. The real issue is that we simply can't provide the same performance/reliability guarantees of a centralized system with a fully open DHT with nodes that aren't completely reliable.
|
we could use a process similar to what tor uses for their relays. |
Follow-up: @jbenet noted that incentivized would have the same effect while being more decentralized. Basically:
The tricky part would be proving that a member of this DHT isn't responding to queries but this isn't insoluble. |
It looks like there is consensus that a single DHT is not an option, especially for the long term. That said, I see a hierarchical structure, which would help in many ways, including performance in terms of delivery delay, time to first byte, flow-completion time etc. In such a setup the top-level (and maybe also some lower-level, but not leaf) resolution system(s) (DHT or similar) would effectively be pointers to lower-layer "services", rather than content as such. If we assume the above rather vague setup, I would support an incentivised approach to securing a place in the top-level resolution system. I would reasonably expect that not everyone would want a place in the top-level resolution system (i.e., be reachable globally) - see a classroom/conference/smart city environment, where content is of interest only locally and hosts are not interested in making the content globally reachable. Those that want to make content globally reachable would have to stake resources. Staking would take into account volume of data, replication/redundancy for availability and performance guarantees etc. and would also depend on the topology, that is, the further up the hierarchy the content/service is advertised the higher the stake. In this case, we would still need mechanisms to prove that nodes do not misbehave and forward requests in time. |
Libp2p currently relies on a fully p2p Kademlia DHT. Unfortunately, even if we get it to the point where it behaves optimally, it still won't scale enough to support our content routing needs.
Proposal: Implement a "closed" DHT with known members. Note: This is often called a "Distributed KV Store".
Unlike our Kademlia DHT:
Motivations:
Performing a 1-RTT lookup and/or a batch put requires a known routing table. The round-trips in traditional p2p DHTs all come from discovering this routing table along the way.
A known routing table requires some form of consensus on the members of this routing table. That's where the trust comes in.
The last part of this is long-lived routing records. The Internet Archive has ~400e9 files which equates to at least 35TiB of "provider records". However, the IA isn't adding 400e9 files per day. Given stable nodes that can be trusted to keep long-lived records alive, the IA wouldn't have to keep re-broadcasting old records to keep them alive.
Notes:
CC @Kubuxu, IIRC you already proposed something kind of like this but I couldn't find the proposal.
CC @obo20 as this is really important for pinning services.
The text was updated successfully, but these errors were encountered: