Raft Consensus Primer
A practical walkthrough of the Raft consensus protocol for distributed systems engineers. Raft solves the problem of getting a cluster of nodes to agree on a sequence of state transitions, even when some nodes fail. This primer covers leader election, log replication, and safety guarantees.
1. Leader Election
Every Raft node lives in one of three states: follower, candidate, or leader. When a follower stops hearing heartbeats from its leader within an election timeout (typically 150-300ms), it becomes a candidate, increments its term, and requests votes from peers. A candidate that collects a majority becomes the new leader.
- Randomized timeouts prevent split votes
- One vote per term per node
- Higher term always wins
2. Log Replication
Once elected, the leader accepts client commands, appends them to its log, and replicates entries to followers via AppendEntries RPCs. An entry is committed when it has been safely replicated to a majority. The leader then applies it to its state machine and notifies followers to do the same.
// AppendEntries RPC
type AppendEntriesArgs struct {
Term int
LeaderId int
PrevLogIndex int
PrevLogTerm int
Entries []LogEntry
LeaderCommit int
}
// Follower rejects if PrevLogTerm mismatches,
// triggering the leader to back up and retry.3. Safety Guarantees
Raft maintains five key safety properties: election safety (at most one leader per term), leader append-only, log matching, leader completeness, and state machine safety. Together these guarantee that committed entries are never lost and every node applies the same sequence of commands.