Our project is a sharded, fault-tolerant graph database built using Raft. You can shard your data across multiple clusters with multiple replicas, the data is persisted on disk for high throughput in reads and writes. Replication and fault-tolerance is done using Raft.
Raft is a consensus algorithm that is designed to be easy to understand. It's equivalent to Paxos in fault-tolerance and performance. Each server in cluster can be in one of the following three states:
- Leader
- Follower
- Candidate
Generally, the servers are in leader or follower state. Log Entries are numbered sequentially and contain a term number. Entry is considered committed if it has been replicated to a majority of the servers. There is unidirectional RPC communication, from leader to followers. The followers never ping the leader. The leader sends AppendEntries messages to the followers with logs containing state updates. When the leader sends AppendEntries with zero logs, that’s considered a Heartbeat. Leader sends all followers Heartbeats at regular intervals. For Voting, Each server persists its current term and vote, so it doesn’t end up voting twice in the same term. On receiving a RequestVote RPC, the server denies its vote if its log is more up-to-date than the candidate. It would also deny a vote, if a minimum ElectionTimeout hasn’t passed since the last Heartbeat from the leader. Otherwise, it gives a vote and resets its ElectionTimeout timer.
Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a hash table.
Sharding is a way of scaling horizontally. A sharded database architecture splits a large database into several smaller databases. Each smaller component is called a shard. Instead of storing all data on a single server, we distribute it across several servers. This reduces the load on a single resource and instead distributes it equally across all the servers. This allows us to serve more requests and traffic from the growing number of customers while maintaining performance.
Our architechture consists of two types of nodes
The Zero node acts like the master node, its job is to map keys to a Alpha Group and also to maintain status of each of the Alpha Groups. The job of the Zero is to also balance nodes evenly among each of the Alpha Groups.
Each Alpha group is a group of nodes that are replicated using graph, so each shard is replicated on multiple nodes, for fault-tolerance. The assumption is that with a replication factor of K=3, 5
the failure of an entire Alpha Group is close to zero.
Due to the use of Raft for replication our system is a CP system.
Since we are building a Graph Database, which are known to be performant for JOIN
type queries. It was of importance to us to optimise the follow operation, i.e
Get all the <Nodes>
that have the a <Relation>
with a specific <Node>
When thinking about sharding a highly connected graph, as seen in social networking platforms such as Facebook and Twitter, you can think about various strategies. One often-used approach is to randomly choose users (nodes or vertices of the graph) and assign them to shards.
Example: Get all the Friends of Sanchit
The “random sharding” model introduces randomness in graph traversal too, meaning to get the appropriate data for a single query, there might be multiple network hops from one server to another for a single traversal, resulting in latency issues.
So it was imperative that all the nodes with the same predicate <Sanchit>.<Friend>
in this case, were to be in the same group. For each predicate and its corresponding subjects and objects, there’s a single key-value pair. So that this operation could be done with very low latency.
For example this is a valid state of the Database.
AlphaOne:
key = <Friends, Sanchit>
value = <Raj, Kannav, ...>
key = <Friends, Kannav> .
value = <Sanchit, Raj, ...>
...
AlphaTwo;
key = <lives-in, Sanchit>
value = <Delhi>
key = <lives-in, Kannav>
value = <Ludhiana>
...
- Map the
Key@Relation
predicate to a alpha group(consistent hashing, so partitioning/repartitioning is easy). Zero points to the alpha group that is servring all the requests to this predicateHash(Key, Relation) = GroupID
- Find out the leader of the subsequent groups to get the nodes which contain the data
g = {node1, node5, node6}
- In case of a Write, contact the leader of this group and do the Write
- Read Only operations can be handled by the Replicas themselves
Once the zero and the alpha groups are set up, we can start sending HTTP Requests
All the requests to graph /<key>/<relation>
- Method
PUT
- Description: Store a graph Node in the system
- Request body
{
"value": [string]
}
- Method
GET
- Description: Get a list of values pointed to by the location
- Response: Array containing all the values that correspond to the query