RECIPE

Raft Consensus Primer

A practical walkthrough of the Raft consensus protocol for distributed systems engineers. Raft solves the problem of getting a cluster of nodes to agree on a sequence of state transitions, even when some nodes fail. This primer covers leader election, log replication, and safety guarantees.

1. Leader Election

Every Raft node lives in one of three states: follower, candidate, or leader. When a follower stops hearing heartbeats from its leader within an election timeout (typically 150-300ms), it becomes a candidate, increments its term, and requests votes from peers. A candidate that collects a majority becomes the new leader.

  • Randomized timeouts prevent split votes
  • One vote per term per node
  • Higher term always wins

2. Log Replication

Once elected, the leader accepts client commands, appends them to its log, and replicates entries to followers via AppendEntries RPCs. An entry is committed when it has been safely replicated to a majority. The leader then applies it to its state machine and notifies followers to do the same.

// AppendEntries RPC
type AppendEntriesArgs struct {
  Term         int
  LeaderId     int
  PrevLogIndex int
  PrevLogTerm  int
  Entries      []LogEntry
  LeaderCommit int
}

// Follower rejects if PrevLogTerm mismatches,
// triggering the leader to back up and retry.

3. Safety Guarantees

Raft maintains five key safety properties: election safety (at most one leader per term), leader append-only, log matching, leader completeness, and state machine safety. Together these guarantee that committed entries are never lost and every node applies the same sequence of commands.

Election Safety
One leader per term
Log Matching
Identical prefixes converge