-
Notifications
You must be signed in to change notification settings - Fork 5
Coordinator and Data nodes
Radu Marias edited this page Aug 1, 2024
·
11 revisions
- Coordinator nodes with many Data nodes. We can use Raft for coordinator nodes
- Coordinator node is served over
http
(axum
orgRPC
withtonic
) - File is split in shards and kept on data node distributed and replicated
- First clients will communicate with coordinator node to setup operations and metadata then the actual file content access will be made with data nodes
- Data nodes acts also like an interface for
DHT
queries accessing the actual data fromtikv
- After a shard is uploaded to data nodes they will use
DHT
andBitTorrent
between them to replicate the shard to multiple nodes. This doesn't require coordinator node - Coordinator node communicate with
tikv
cluster to get/put metadata about the file - Metadata contains information about the files where key is piece hash and value is a data nodes of that file
-
WAL
strategy is used to commit files to all of our replicas and updatetikv
with the data node handle - Coordinator/Data nodes communication is done over a channel we can try out
Kafka
or maybe withgRPC
to make sure that a data nodes contains a shard or to distribute a shard over the data nodes when a client uploads a file - Client is then given a list of data nodes to access the shards in parallel and assemble the file
https://docs.rs/raft/latest/raft/
From https://en.wikipedia.org/wiki/CAP_theorem I think we should target Consistency and Availability. Availability
is also affected by Network partition
so in that case we will choose Consistence.