Subtracks & Tasks
Naive Broadcast (Flooding)
Implement Basic Broadcast to All Nodes
Implement a broadcast system where messages sent to any node eventually reach all nodes. Handle three message types: 1. topology: Tells you your neig...
Build Flat Tree Topology Gossip
Optimize your broadcast by using a tree topology. Instead of flooding to all neighbors, organize nodes into a spanning tree where each message travels...
Implement Peer-to-Peer Gossip with Random Neighbors
Implement a gossip protocol where each node randomly selects neighbors to share information with. This provides robustness against node failures while...
Add Message Batching to Reduce Network Overhead
Reduce network overhead by batching multiple messages into single transmissions. Instead of sending immediately, buffer messages and flush periodicall...
Handle Network Partition Healing and Resynchronization
Handle the scenario where network partitions heal and previously isolated nodes reconnect. Implement anti-entropy mechanisms to synchronize message se...
Gossip Protocol
Implement Gossip Fanout with Random Peer Selection
Gossip protocols spread information probabilistically: instead of broadcasting to all nodes, each node forwards to K random peers. This is more resili...
Calculate Minimum Fanout for Reliable Delivery
What fanout K guarantees that all N nodes receive a message with probability >= 0.99? The theory says that after R rounds of gossip with fanout K, the...
Add Periodic Gossip Rounds on a Timer
Instead of only gossiping when a new broadcast arrives, add **periodic gossip rounds**: every interval, pick K random peers and send them all your kno...
Implement Anti-Entropy with Digest Comparison
Full state sync wastes bandwidth when nodes are mostly in sync. **Anti-entropy** optimizes this: first exchange compact digests. Only transfer full st...
Tune Gossip Parameters for Maelstrom Broadcast
The Maelstrom broadcast workload requires messages-per-op < 30 under network partitions. Your task is to implement configurable gossip parameters and ...
Topology-Aware Gossip
Implement Tree-Based Broadcast Overlay
Random gossip wastes messages because nodes may receive duplicates. A **spanning tree** ensures each node receives exactly one copy: the root broadcas...
Handle Tree Node Failure with Direct Fallback
Tree broadcast is efficient but fragile: if a node crashes, all its descendants lose connectivity. Your task is to add **failure detection** with dire...
Implement Hybrid Tree and Gossip Broadcast
Pure tree broadcast is fast but fragile. Pure gossip is reliable but slow and wasteful. A **hybrid** approach uses tree for the first hop (fast, effic...
Simulate Network Partition and Healing
Network partitions split the cluster into isolated groups. After the partition heals, gossip must merge the diverged states. Your task is to simulate ...
Epidemic Algorithms and CRDT Gossip
Implement Grow-Only Set (G-Set) with Gossip
A **Grow-only Set (G-Set)** is the simplest CRDT. Elements can be added but never removed. Merge is set union, which is commutative, associative, and ...
Implement Two-Phase Set (2P-Set)
A **2P-Set** (two-phase set) supports both add and remove by maintaining two G-Sets: the add-set and the remove-set (tombstones). An element is in the...
Implement Last-Writer-Wins Key-Value Store
A **Last-Writer-Wins (LWW)** register resolves conflicts by always keeping the value with the latest timestamp. This is simple but can lose concurrent...
Demonstrate LWW Data Loss with Version Vectors
LWW silently loses data when two clients write concurrently. Your task is to demonstrate this and implement a version-vector alternative that detects ...
Benchmark Gossip KV Store Performance
Benchmark your gossip KV store to measure real-world performance characteristics: 1. **Convergence time**: How long until all replicas have the same ...
Interview Prep
Common interview questions for Distributed Systems / Infrastructure Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Questions are representative of real interview patterns. Model answers are starting points ā adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track ā and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.
Rabbit Holes
For when you want to go deeper. Curated papers, posts, and talks beyond what this track covers.
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol
The paper behind Consul and HashiCorp's memberlist library. SWIM is gossip done carefully ā it explains why naive gossip has O(log n) detection latency and how to improve it.
Epidemic Algorithms for Replicated Database Maintenance
Demers et al., 1987. The paper that gave gossip protocols their name. The "anti-entropy" and "rumor mongering" terminology you see in Cassandra's documentation traces directly here.
How Cassandra Uses Gossip
Cassandra's official documentation on their gossip implementation. Covers how failure detection, schema propagation, and token metadata spread through a ring.
Phi Accrual Failure Detector
Instead of a binary "alive or dead" judgment, the phi accrual detector outputs a suspicion level that grows continuously with missed heartbeats. Cassandra adopted this in 2013.