Build Mini-Dynamo
Implement Amazon Dynamo from the 2007 paper. Build a fully functional distributed key-value store featuring consistent hashing, quorum replication, sloppy quorums, hinted handoff, vector clock conflict detection, Merkle tree anti-entropy, and read repair.
Amazon Dynamo
Published in 2007 at SOSP, Amazon's Dynamo paper defined a new category of distributed storage: highly available, eventually consistent, and always writable — even during network partitions. It powers Amazon's shopping cart and influenced every NoSQL database that followed, from Cassandra to Riak.
What You Will Build
You will implement every major subsystem of Dynamo from scratch, in six progressive stages:
- Consistent Hash Ring — position nodes and keys on a circular ring; add virtual nodes for uniform distribution; compute minimal key migrations when the topology changes.
- Quorum Replication — implement the NWR model; write to N replicas, succeed on W acknowledgements; read from N, return the highest-versioned value from R responses.
- Fault Tolerance — extend writes beyond preferred replicas using sloppy quorums; track hinted handoffs and deliver them on recovery.
- Vector Clocks — version every write with a vector clock; detect whether two versions are causally ordered or concurrently written (a conflict).
- Anti-Entropy — build Merkle tree bucket hashing to efficiently find which key ranges diverged between replicas; implement read-repair for on-demand healing.
The Core Trade-off
Dynamo sits firmly in the AP quadrant of the CAP theorem: it sacrifices strong consistency in exchange for availability and partition tolerance. Understanding how it does this — and what "eventually consistent" means in practice — is the central lesson of this project.
Prerequisites
You should be comfortable with Python or Go, basic data structures (sorted lists, dictionaries), and the concept of hashing. Prior knowledge of distributed systems is not required — this project builds it.
Tracks
Consistent Hash Ring
0/4 completedBuild the core data structure that distributes keys across nodes. Understand why consistent hashing outperforms modular hashing for dynamic clusters, and how virtual nodes eliminate hot spots.
Node Membership Changes
Quorum Replication
0/3 completedImplement the NWR quorum model that lets Dynamo tune its availability and consistency trade-off. Write to N replicas, require W acknowledgements. Read from N, return the highest-versioned answer from R responses.
Fault Tolerance
Conflict Resolution
0/4 completedBuild the versioning and reconciliation layer. Vector clocks track causal history across nodes. Merkle trees find diverged ranges efficiently. Read repair heals stale replicas without a dedicated background process.