Subtracks & Tasks
Raft Log Replication
Implement Log Replication
Implement Raft log replication from leader to followers: 1. Leader receives client commands, appends to local log 2. Leader sends AppendEntries to al...
Ensure Log Matching Property
Implement the Log Matching safety property: On the follower side: 1. Receive AppendEntries with prevLogIndex, prevLogTerm 2. If entry at prevLogIndex...
Implement Entry Commitment
Implement log entry commitment: 1. Leader tracks matchIndex for each follower 2. For each index N, count how many nodes have matchIndex >= N 3. If ma...
Apply Committed Entries to State Machine
Apply committed log entries to the state machine: 1. Track lastApplied - highest entry applied to state machine 2. When commitIndex > lastApplied, ap...
Implement Election Restriction for Safety
Implement the Election Restriction to ensure safety: A candidate must have an "up-to-date" log to win election: 1. Voter compares own log to candidat...
Commitment and Application
Implement the Raft Commitment Rule
Implement the Raft commitment rule: an entry is committed when a majority of nodes have it in their log. The leader uses `matchIndex[]` to determine w...
Implement the Apply Channel for State Machine
Implement an apply channel that feeds committed log entries to the state machine. The state machine is a simple key-value store that processes `Put`, ...
Handle Leader Changes with No-Op on Election
When a new leader is elected, it must not apply uncommitted entries from previous terms. The "no-op on election" trick solves this: the new leader imm...
Add Snapshot Support for Log Compaction
Add snapshot support to compact the Raft log. When the log grows beyond a threshold, take a snapshot of the state machine. Followers that fall behind ...
Pass Linearizable KV with Network Partitions
The ultimate test: pass a linearizable key-value workload with 5 nodes under network partitions. This combines everything: leader election, log replic...
Paxos
Implement Single-Decree Paxos Phase 1 (Prepare/Promise)
Implement Phase 1 of single-decree Paxos (agree on one value). Phase 1 (Prepare/Promise): 1. Proposer selects a unique proposal number `n` 2. Propose...
Implement Paxos Phase 2 (Accept/Accepted)
Implement Phase 2 of Paxos. After getting a majority of promises in Phase 1, the proposer sends `Accept(n, v)` to acceptors. Value selection rule: if...
Prove Paxos Safety: Chosen Values Are Immutable
Prove that once a value is chosen in Paxos, all future proposals will also choose the same value. This is the core safety property. Implement a simul...
Implement Multi-Paxos for an Infinite Log
Extend single-decree Paxos to Multi-Paxos: an infinite log where each slot is a separate Paxos instance. Key optimization: once leadership is establi...
Compare Raft vs Multi-Paxos
Compare Raft and Multi-Paxos across multiple dimensions. Both solve the same problem (replicated log) but make different design tradeoffs. ```json Re...
Byzantine Fault Tolerance
Understand Byzantine Faults with Real-World Examples
What is a Byzantine fault? Unlike crash faults (node simply stops), Byzantine faults allow arbitrary misbehavior. A faulty node can lie, send contradi...
Implement Simplified PBFT with 4 Nodes
Implement a simplified PBFT (Practical Byzantine Fault Tolerance) with 4 nodes (f=1 Byzantine fault). PBFT Three-Phase Protocol: 1. **Pre-prepare**: ...
Detect and Handle Equivocation Attacks
Simulate a Byzantine node that sends contradictory messages to different peers (equivocation). Show that PBFT correctly handles this. ```json Request...
Prove the N >= 3f+1 Byzantine Fault Threshold
Prove that tolerating f Byzantine faults requires at least N >= 3f+1 nodes. Then verify empirically. The proof: - We need two quorums to overlap in a...
Implement Tendermint-Style BFT Voting Rounds
Research and implement Tendermint-style BFT voting (used in Cosmos blockchain). Tendermint simplifies PBFT with clear round structure and a rotating p...
Interview Prep
Common interview questions for Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.
Rabbit Holes
For when you want to go deeper. Curated papers, posts, and talks beyond what this track covers.
Paxos Made Simple
Lamport's follow-up to the original Paxos paper, written to be readable. After building Raft, reading Paxos Made Simple reveals exactly what Raft's "understandability" claim means in practice.
Paxos Made Live (Google)
The Google paper on engineering Paxos into a production system (Chubby). The gap between the algorithm and the deployed system is large. This paper documents every piece of that gap.
Viewstamped Replication Revisited
Liskov and Cowling's reformulation of Viewstamped Replication. VR was invented at the same time as Paxos and is arguably cleaner. Understanding all three (Paxos, VR, Raft) gives you the full picture of state machine replication.
Heidi Howard's Flexible Paxos
A generalization that shows Paxos's quorum requirement is more flexible than "majority." You can use different quorum sizes for phase 1 and phase 2. This unlocks lower-latency configurations for geographically distributed clusters.