-
Notifications
You must be signed in to change notification settings - Fork 5
Coordinator and Data nodes
Radu Marias edited this page Aug 1, 2024
·
11 revisions
- Coordinator nodes with many Data nodes. We can use Raft for coordinator nodes
- Coordinator node is served over http (axum or gRPC with tokio)
- File is split in shards and kept on data node distributed and replicated
- First clients will communicate with coordinator node to setup operations and metadata then the actual file content access will be made with data nodes
- Data nodes acts also like an interface for DHT queries accessing the actual data from tikv
- After a shard is uploaded to data nodes they will use DHT and BitTorrent between them to replicate the shard to multiple nodes. This doesn't require coordinator node
- Coordinator node communicate with tikv cluster to get/put metadata about the file
- Metadata contains information about the files where key is piece hash and value is a data nodes of that file
- WAL strategy is used to commit files to all of our replicas and update tikv with the data node handle
- Coordinator/Data nodes communication is done over a channel we can try out Kafka or maybe with gRPC to make sure that a data nodes contains a shard or to distribute a shard over the data nodes when a client uploads a file
- Client is then given a list of data nodes to access the shards in parallel and assemble the file